
NCAG-H/1. 

3mp«taf Counctf of Jlgn'cuffuroe (^eeear^, 3nJ!t> 




A HANDBOOK OF STATISTICS 

FOR USE IN 


I i A 




PLANT BREEDING AND AGRICULTURAL PROBLEMS 


BY 

F. J. F. SHAW, C.I.E., D.Sc., A.R.C.S., F.L.S., 

Imperial Economic Botanist and Director, Imperial Institute of Aartcuttural 

Research, Pusa. 



DELHI : MANAGER OE PUBLICATIONS 
1936 

Price Rs. 4-6 or 7s. 3d, 


lA Lib., 



List of Agents from whom 
Government of India Publications are available. 

ENGLAND. 

Thu high COMMISSIONEB EOB. INDIA, INDIA HOUSE, ALDWYCH, LONDON, W C “> 

PALESTINE. 

Steimatzlcy, Jerusalem. 

INDIA. 

(«) Pbovinoial Qoteknmbni' Book Depo-jcs. 

Madbas : — Superintendeut, Government Press, Mount Road, Madras. 

Bombay Superintendent, Government Printing and Stationery, Queen’s Road, Bombay 
SIND Manager, Sind Government Book Depot, Karachi fSadar). 

' A«aha™°®® :-Superintendent of Government Press, United Provinces of Agra and Oudh, 

Punjab : — Superintendent, Government Printing, Punjab, Lahore. 

Burma : — Superintendent, Government Printing, Burma, Rangoon. 

f Superintendent, Government Pi-iutlng, Central Provinces, Nagpur 

Assam. — S uperintendent, Assam Secretariat Press, Shillong. ‘ 

P- O. Gulzarbagh, Patna. 

moRTH WEST Erontibr PROVINCE .—Manager, Government Printing and Stationery, Peshawar. 

(b) Pritate Book-sellers. 


Aero S6ores, Karachi City.* 

Albert Library, Dacca. 

Banerjee & Bros,, Banchi, Messrs. G-. 

Banthiya Co., Ltd., Station Road, Ajmer 
Bengal Flying Club. Bum Bum Cantfc * 

Bhiiwnani Sons, New Delhi 
Book Company, Calcutta. 

Trivandrum. South India. 
Burma Book Club, Ltd., Rangoon 
Bntterworth & Co. (India), Ltd., Calcutta. 

Oa'lcutt^®®^ Agency, Shama Charan Doy Street, 

Cbatterjco <fe to., 3, Bacharam. Chatterjee Lane, Oalouttn. 
°^OalcS*^’ * Co., Ltd., 1.3, College Square, 

City Book Co., Madras. 

City Book House, Meston Road, Cawuporo. 

Commercial Book Co., Lahore. 

Das Gur^-a Co., 54/3. College Street, Calcutta 
Deccan Bookstall, Poona 4. 

Delhi and TT. P, Flying Club, Ltd., Delhi.* 

English Book Depot, Ferozepore. 

^ I>opot, Taj Road, Agra, and Sacldar Bazar, 

Hepdt, Bank Road, Ambala Oantt. and 

English Bookstall, Karachi. 

Fakir Chand Marwab, Peshawar Cantonment 
Fono Book Agency, Simla. 

Gaya Prasad iSr Sons, Agra, 

Grantha Mandir, Cuttack. 

Higginbothams, Madras. 

Hindn Library 137-P, Balarara De Street, Calcutta. 

(Deccan) Depot, Ohadergliat, Hyderabad 

Inrperial Book Depot and Press, near Jama Masjid 
(Machhiiwalan), Delhi. 

Indian Army Book Depot, Dayalbagh, Agra. 

Indian Army Book Depot, Jnllundiir City and Darya- 
ganj, Delhi. ^ 

Indian Book Shop, Benares City 

Indian School Supply Depot, Central Avenue, South, P. O. 

Dharamtala, Calcutta. 

Insurance Publicity Co., Ltd.. Lahore. 

International Book Service, Poona 4 

Connaught Place, New 

Delhi, TVle.ssrs. J. M. 

^2;* Place. Calcutta (for 

Meteorological publications only). 

Kali Charan & Co., Municipal Market, Calcutta. 

Depot, 15, College Square, Calcutta. 

Kamala Book Stores, Bankipore, Patna. 

Karnatalm Publiahiiig House, Bangalore City. i 

Keale & Co., Karachi. 

^ Mes’^4 t 'Teppaltulam P. O., Trichinopoly I 

Lahiri & Co., Calcutta, Messrs. S. K. 

Law Printing House, 11, Mount Road. Madras. 

Law Puhlishing Co., Mylapora, Madras. 

^'^piibhcatimis^only^^^ Meteorological 

Local Self- Govt. Institute, Bombay. 

Co (India), Arbab Road, Peshawar, Murree, 
Nowshera and Rawalpindi. 

London Book Depot, B. I. Bazaar, Bareilly, U, P. 

Post Box No. 94, Lahore; Messrs. U. P. 
Jfodern Book Depot, Bazar Road, Sialkot Cantonment. 


Mohan W Dossabhai Shall, Raikot. 

Lahore'"'”®"’ Saldmitha Street. 

Nandkishore & Bros., Chowk, Benares City. 


Publishers, George 

^ Kitab Mahal 
Ltd., 


192 


Calcutta, Messrs. 
Tract and Book 


Town, Madras, 

, Hornby Road, 
W. 

Society, 18, Clive 
Shukrawar, Poona 


* Agents for pnblfcnttons on Aviation oply. 


Nateson Si Co., 

Messrs. G. A, 

New Book Co., 

Bombay. 

Newunan * cfe Co., 

^orth India Christian 
Road, Allahabad. 

‘‘^^Wlying Agency, 15, 

03rford Book and Stationery Company, Delhi Lahore 
Simla, Meerut and Calcutta -i-wnore 

Parikh Co., Baroda, Messrs*. B. 

Co.. 20, Sliib Narayan Das Lane 
Calcutta, and 210, Cloth Miirlcet, Delhi. ’ 

Popuhir Book Depot, <lrant Rond. Boinbav 
Puniab Hellgious Book Society, Lahore 
Raghunath Prasad it Sons, Patna City.' 

Ram Krishna Bms., Opposite Bishrambag, Poona City 
Ram Narain Lai, Katra, Allahabad. 

Rama Krishna .fe Sons, Book-sellers, Anarkali, Lahore 
Depot, Stationery Mart, Kashmere Gate, 

^ xf' I ^^^^Dvardes Road, Rawalpindi, 

Mull ee and Peshawar, Messrs .T ^ 

Ray Chowdhnry .t Co., 119, Ashutosli Mukherjee Road. 

Bhawanipiir, Calcutta,. ^ 

Rochonse Si. Sons, Madras. 

Messrs College Square, Calcutta, 

Sampson' Wilikim & Co., 127-B, The Mall, Cawnporo 
Sarcar <fe Sotis, 15, Collogc Square, Calcutta, Messrs. M C 
Sarkiir ite Co , Ltd., 18, Shama Charan Do Street and H/2* 
Hastings Street, Calcutta. Messrs PC * 

Scientific Publishing Co., 9, Taltola Lane’ Calcutta. 
Seshachalarn Si- Co., Masulipatam, Messrs. M 
Shivji S'. Co., P. O. Chauliaganj, Cuttack. 

Karnataka Piistaka Bhandara, Malamuddi 

Imarwar. 

S. P. Bookstall, 21, Budhwar, Poona. 

Standard Book Depot, Lahore, Dalhousie and Delhi 
Standard Bookstall, Karachi. 

Standard Bookstall, Quetta. 

Standard Law Book Society, 5, Hastings Street, Calcutta 
Standard Literature Company, Ltd., Calcutta, 

Students* Popular Depot, Kachari Road, Lahore 
Surat and District Trading Society, Surat 
Taraporevala Sons & Co., i^ombay, Messrs. D. B. 

Thacker <fe Co., Ltd., Bombay, 

Thacker, Spink Co., Ltd., Calcutta and Simla, 

Tripathi Si Co., Book-sellers, Princess Street, Kulbadevi 
Road, Bombay, Messrs. N. M. ' Jvmuaaeii 

Union Stores, Indore City. 

University Book Agency Kacliari Road, Lahore 

Upper India Publishing House, Ltd., Literature Palace. 

Ammnddaula Park, Lucknow. * 

Varadachary & Co., Madras. Messrs. P. 

Venkatasubban, A., Law Book-seller, Vellore. 

Wheekr & Co., Allahabad, Calcutta and Bombay, Messrs. 
A, H. 

Young Man Si Co., Ajmer and Egerton Road, Dolbj. 



FOREWOED 


I li6 application of statistical tests to experimental results is a subject tbe import- 
ance of wbich for the biologist has greatly increased in recent years. Modern sta- 
tistical text books, however, are generally written by mathematicians and describe 
the subject in a manner which is obscure to the biological student whose knowledge 
of mathematics is elementary. It is hoped that this book will provide an easy guide 
for the students in Indian universities and agricultural colleges and for workers in 
agricultural departments, in the application of statistical methods to the class of 
• experiments which commonly confront the worker in plant-breeding and agricul- 
tural problems. It is not intended as a treatise in mathematical statistics but aims 
at giving in simple language the methods by which statistics may be used to test 
the significance of the results of experiments. The book is based on a part of the 
post-graduate course in plant-breeding which is given in the Botanical Section of 
. the Imperial Institute of Agriculiural Resrarcli, Pus<a, and all the examples given 
in the book are taken from experiments, actually carried out in India and Burma. 

The mathematical tables 'which are necessary for the use of the student who is 
studying statistics are being published by the Imperial Council of Agricultural 
Bescarch. and will'be readily available for workers in India. 


F. J. F. SHAW. 








CONTENTS 




Chapter I. Sampling 1 

„ 11. Frequency distributions and averages ....... 4 

„ , III. Measurements oe dispersion 14 

„ IV. Probability and goodness oe eit . . 24 

„ V. Probability integral 42 

„ VI. Signieioance oe dieeeeences between means 47 

„ VII. Correlation, regression and long-time trends 64 

„ VIII. Experimental technique oe field trials 81 

„ IX. Statistical interpretation oe simple field experiments ... 90 

„ X. Statistical interpretation OF complex AND SERIAL experiments . .115 

„ XL Soil heterogeneity and the analysis of covariance . . . .140 

Appendix I. Use of logarithms 171 

„ II. Important formul.® 176 

„ III. Snedecor’s Tables for the values of F and Fisher’s t , . . .178 




A Handbook of statistics 

FOR USE IN 

PLANT BREEDING AND AGRICULTURAL PROBLEMS 

BY 

F. J. F. SHAW, C J.E., D.Sc., A.R.C.S., F.L.S., 

Imperial Economic Botanist and Director, Imperial Institute oj Agricultural Research, Pusa, 

(Received for publication on 2nd November 1984.) 

CHAPTER I 
SAMPLING 

Statistics is a branch of applied mathematics which deals with observations. In. 
biology our observations deal with living things, either the size of parts or organisms 
or their frequency of occurrence, and the word BIOMETRY is applied to this branch 
of statistics. Galton, Pearson, Udney Yule, Fisher, Pearl and Elder ton are among 
the foremost names in the development of what is a relatively young branch of 
biological science. 

In any series of observations it is obviously impossible that our observations 
should include all the individuals in the universe or even all tlie individuals in that 
population which is accessible. Observations are, therefore, based on samples taken 
from a population and our first care must be that the sample is a true representative 
of the population. 

Samples consist of a number of variables which may be quantitative or quali- 
tative, continuous or discrete. Any quantity or quality which changes is called 
the variable. The observations which represent the changes in the variable are 
called variates. A quantitative variate can be expressed by a number, e.g., height 
of men in inches ; in a sample of the males of a population a man 68 inches tall is a 
variate of 68 inches. A qualitative variate is distinguished by some quality ; for 
example, in a crop of hybrid linseeds individuals with coloured petals can be differen- 
tiated from individuals with white petals, and the frequency of occurrence of each 
class estimated. Quantitative variates are generally continuous, that is to say, they 
exhibit all gradations in size development throughout the sample. Discrete variates 
differ from one another by finite gradations without any intermediate stages, so that 
each variate has a distinct and separate value and fractions of a unit cannot occur, 



2 


HANDBOOK OB STATISTICS FOR USE IN 


e .^. 5 number of petals in a flower, number of Europeans in India, etc. From a sample 
we can calculate a number of statistical constants (e.^., tbe average) which convey 
an idea of the nature of the sample and which summarize briefly its characters. 

Theory oe sampling 

A sample is taken to yield information about a larger bulk or population of 
individuals, and in order that it may be truly representative of the population it 
must be so chosen that each individual in the population has an equal and indepen- 
dent chance of being included. A statistical enquiry into the height of men in 
India could not be carried out on a sample taken exclusively in Bihar, since owing 
to racial and geographical factors a sample taken from Bihar is not truly represen- 
tative of the whole population of India. Again a sample of a thousand men which 
consisted of 500 pairs of brothers would not satisfy the requirements of a true sample, 
since there is a tendency for brothers to be alike, in other words, in such a sample 
the variates would be dependent on one another. The conditions under which a 
sample may be expected to present the characters of the universe from which it has 
been chosen are : — 

(i) Independence. The variates which make up the population must be inde- 
pendent of one another and must each have the same chance of inclusion in the 
sample. 

(ii) Homogeneity. The universe from which the sample is taken should consist 
of individuals of the same kind under similar conditions. 

Methods oe sampling 

(1) Random selection. This means selection in such a way that every individual 
in the universe to be sampled has an equal chance of inclusion in tlie sample. This 
could be achieved by allocating numbers to all individuals in the universe and draw- 
ing numbered tickets blindly from a bag, the numbers so drawn indicating the in- 
dividuals to be included in the sample. It is not, however, always practicable to 
number all the individuals in a universe, for instance, if it is desired to sample the 
heads in a field of wheat the labour involved in allocating a number to every head 
in a field of only one acre is obviously prohibitive. Under these circumstances 
the method pursued at Pusa is to make a random selection of heads from the field 
which is numerically much greater than the sample that is desired. These heads 
are then numbered and a random drawing of numbers is taken. If a sample of 400 
ear-heads is desired the practice at Pusa is to make a random collection of 1,200 
heads from the field. It is obvious that in such cases much depends on the true 
randomness of the original collection in the field. Our method is to set men to walk 
diagonally across and up and down the field, each man plucking a head on his right 
or left hand at every pace without any conscious selection. Men are trained to 
stoop and pluck a stalk from near the ground in order that individuals of short 
stature may have a fair chance of inclusion in the collection. 

A word of caution is necessary with regard to the drawing of the numbered 
tickets. When the bag contains 1,200 tickets, the odds are 1,199 : 1 against the 
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drawing of any particular variate, but when 390 tickets have been withdrawn the 
odds are only 809 : 1 against any particular number being taken. It is, therefore, 
sometimes desirable to cancel a ticket when drawn and to replace it in the bag so as 
to keep the total number of tickets in the bag constant. If a cancelled ticket is 
drawn a second time, the drawing is neglected. 

(2) Spatial selection. The selection of individuals at regular intervals. This 
method is suitable to sampling when the universe to be sampled consists of a field 
in which plants are sown in regular lines. It is then possible to select every 5th, 
10th, or 20th plant in the lines according to the size of sample desired. If more 
convenient, selection by measurement — ^taking 5, 10 or 20 feet as the space between 
selected individuals — can also be done. 

(3) Selection by design. If the universe to be sampled is not homogeneous then 
the sample should contain members from each of the classes constituting the uni- 
verse in the proportions in which those classes exist in the universe. If a sample 
is to be taken from the human population in India, individuals from every race must 
be included in the sample and the proportion in the sample of any one race should 
be the same as the proportion of that race in the entire population. 


Ebliability op the sample 

If two random samples from the same population yield on analysis statistical 
constants such as arithmetical averages, which differ significantly, then we infer 
that either — 

{a) there have been errors in the method of sampling, or 
(6) the samples have been very heterogeneous, or 
(c) the samples are not sufficiently large. 

As will be seen later in our studies of probabilities, the significance or reliability of a 
mean or arithmetical average obtained from a sample, as measured by its probable 
error or by its standard error, varies inversely as the square root of the number of 
variates in the sample. Thus, to double the precision of an arithmetical average 
obtained from a sample of 25, we should require to take a sample of a 100, since 
25=5 and 100=10. Silnilarly, to treble the precision we have to take a 
sample of 225 variates, since \/ 225=15. 


REFERENCES 
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Tippett, L. H. C. (1931). The Methods of Statistics, Chapter III, Williams and 
Norgate, Ltd., London. 
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CHAPTER II 

FREQUENCY DISTRIBUTIONS AND AVERAGES 

Wlien a sample is taken from a population, it is generally taken witk tlie object 
of studying some particular attribute of tke universe to which it belongs. Thus, for 
instance, if it is desired to study the occurrence of flower-colour in a hybrid- popula-^ 
tion of linseeds, the attribute of each variate to be measured would be the kind of 
colour present. The observer would be confronted at the conclusion of his obser- 
vations with a long list of variates and their attributes and his first task would be to 
classify these variates according to their attributes. The following table gives a 
classification of a Fg population of linseed : — 

Table I 

Dislrihuttion of petal colour in a population of linseeds 


Variable or Claes 

Number of variates 
or 

Class frequency 

Blue 

109 

Lilac 

01 

White . 

02 

Pink 

22 

Tot Ah 

314 


In this case the population of 314- has been divided into four class('s and the fre- 
quency of each class shows the distribution of the cha.racter or jittribute under study 
in the population. The variables in this example are discrete, thai. is to say, the 
numbers cannot be continuous since fractional parts of a variate cannot oeeur. The 
division of the population into classes is easy since each variarbh'. is clear and distinct 
from any of the others. The class Itequency of any particular variate is tlic number 
of times that variate occurs in the population. 

The frequency distribution, when the attributes winch arc‘ to b(‘ studied are 
quantitative, necessitates the classification of a, number of numerical (quantities 
which are continuous, i.e., show all gradations of value within a def inite range. If, 
for instance, it is d(‘sirecl to measure the length of head of a variety of wheat, the: 
observer will obtain a large number of measurements from a sample take.n Avith tin; 
precautions previously described. His first task will be to classify theses dal-a with 
the object of reducing tliem to a form in winch tlusy can he matlnsmatically Jiandled. 
The classification of such a mass of data involves tlie division of tlu' variates into 
groups, each group or class containing variates of similar values. Ihibh; II gives tlu' 
data of the length of ear-head and the number of grains per (;ar-h(‘ad in sample of 
Pusa 12 wheat and the frequency distribution of the length of ear- head is shown in 
Table III, where the 400 variates are grouped into 17 classes. 
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Table II 


Data of length of ea/r-head and number of grains gger in Pusa 12 wheal. Sample 

taken from Jhilli field {Pusa Farm) 1930-31 
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Data of length of ear-head and numb&i' of grains jper ear in Pusa 12 wheal. Sample 
taken from JhilVi field {Pusa Farm) 1930-31— eoiAd. 


u 

<D 

'a 

c^S 

U 

CD 

of grains 
head 

1 

xs 

cb 

0 

rd 

1 

pH 

d 
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head 
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a 

-Tj 

ccJ 

CD 

% 

o 

of grains 
-head 

U 

CD 

I 

nd 

d 

QJ 

rd 

Fh 

CD 

Sh 

m 

.3 

pH 

- ccS 
^ CD 

§ 

02 

O 

1 ^ 

bJO 

© 

h:! 

Number 
per ear- 

Pi 

p— ( 

‘ph 

CD 

02 

O 

4^ 

1? 

© 

Number 
per ear- 

§ 

r-H 

ccJ 

' rH 

Fh 

CD 

OQ 

o 

bO 

fl 

O 

vA 

Number 
per ear 

d 

H 

CD 

m 

O 

d 

bO 

d 

o 

d 

u Ji 

CD 

a a 

P-i 

201 

9-7 

31 

251 

10-0 

31 

301 

8-7 

23 



31 

202 

11-7 

44 

252 

84 

23 


10-2 

30 



41 

203 

8*9 

23 

253 

10-6 

29 


12-0 

41 

353 

6*2 

16 

204 

205 

11-6 

11-7 

30 

36 

254 

255 

10-6 

9-2 

39 

34 

BP 

10-4 

12-0 

35 

42 

354 

355 

10-8 

10-9 

35 

32 

205 

9-2 

31 

256 

10-5 

34 

306 

11-9 


356 

11-0 

34 

207 

1()'2 

31 

257 

9*4 

26 

307 

8-5 

23 

357 

10-5 

34 

208 

10-6 

32 

258 

10-0 

27 

308 

6-4 

14 

358 

10-0 

29 

209 

104 

20 

259 

9-3 

32 

309 

11-4 

41 

359 

1 1*5 

35 

210 

11 -8 

41 

260 

11-2 

38 

310 

9-8 

28 

360 

<)*2 

28 

2U 

10-6 

30 

261 

8-7 

27 

311 

10-2 

28 

361 

S'O 

24 

212 

10-3 

30 

262 

9-6 

34 

312 

10-9 

38 

362 

1 1*8 

38 

213 

10-7 

26 

263 

9-2 

33 

313 

10-1 

33 

363 

<)*7 

28 

214 

11-8 

32 

264 

9-3 

27 

314 

121 

41 

364 

t)-0 

27 

215 

11-8 

33 

265 

9-5 

34 

315 

9-5 

34 

365 

6*3 

18 

216 

9-8 

30 

266 

9-8 

29 

316 

10-4 

30 

366 

10-3 

41 

217 

94) 

35 

267 

1 0-0 

27 

317 

9-9 

39 

367 

8*6 

27 

218 

94) 

25 

268 

11-2 

31 

318 

9-2 

27 

368 

10-2 

32 

219 

9*6 

31 

269 

11 -0 

34 

319 

11-3 

38 

3(;9 

8*() 

23 

220 

94) 

24 

270 

1 1*6 

38 

320 

8-8 

26 

370 

104) 

3 1 

221 

13-7 

51 

271 

10-0 

37 

321 

8-4 

36 

371 

10-4 

33 

222 

223 

94) 

26 

272 

13-2 

39 

322 

8-7 

22 

372 

1 1 * 4: 

33 

29 

5-7 

15 

273 

7-6 

18 

323 

9-9 

26 

373 

8-6 

224 

9*2 

18 

274 

10-1 

34 

324 

121 

42 

374 

1 1*5 

38 

225 

9-4 

32 

275 

114 

26 

325 

10-5 

33 

1)75 

10-8 

36 

226 

10-7 

45 

276 

12-0 

35 

326 

7-0 

21 

376 

10-5 

35 

227 

8*8 

27 

277 

lD-0 

•to 

327 

11-5 

35 

377 

11-8 

35 

228 

8-8 

25 

278 

10-3 

34 

328 

9-7 

28 

378 

10-4 

3 1 

229 

230 

IM 

10*4 

32 

27 

279 

280 

10-5 

10-9 

40 

43 

329 

330 

8-2 

9-6 

21 

29 

379 

380 

1 0*4 
9*5 

39 

27 

231 

11-5 

32 

281 

10-8 

34 

331 

11-6 

35 

381 

1 1*8 

42 

232 

11*9 

44 

282 

9-0 

31 

332 

8-4 

23 

27 

382 

9*8 

:{() 

233 

10*5 

40 

283 

10-4 

28 

333 

9-5 

383 

8*5 

10*7 

23 

231 

7*1 

10 

284 

12-3 

43 

334 

8-6 

33 

384 

35 

235 

10-8 

30 

285 

!)-7 

30 

335 

12-9 

40 

385 

10*0 

30 

236 

9-5 

30 

286 

9-1 

30 

336 

9-2 

26 

386 

10*7 

30 

237 

84) 

35 

287 

10-0 

33 

337 

10-2 

29 

387 

9*2 

24 

238 

12*2 

49 

288 

8-8 

28 

338 

10-3 

32 

388 

10*8 

32 

239 

240 

lb6 

10*7 

33 

42 

289 

290 

9-8 

9-7 

28 

34 

339 

340 

12-4 

9-4 

33 

30 

389 

390 

10*3 

8*2 

36 

22 

241 

9-0 

29 

291 

54 

10 

341 

8-9 

25 i 

391 

9*5 

34 

242 

8-6 

26 

32 

292 

10-1 

29 

342 

9-9 

24 

392 

6-8 

19 

243 

10*1 

293 

8-3 

28 

343 

9-8 

26 

393 

10-6 

3(> 

244 

11*2 

23 

294 

9-0 

25 

344 

6-9 

21 

394 

10-0 

I 

245 

9*3 

25 

295 

10-7 

38 

345 

11-8 

36 

32 

395 

12*4 

39 

246 

11-2 

38 

296 

10-0 

31 

346 

13-1 

396 

8*J. 

24 

21 

247 

7*9 

19 

297 

9-3 

30 

347 

9-4 

25 

397 

7*5 

248 

9-0 

26 

298 

13-5 

54 

348 

10-5 

31 

398 

10*3 

32 

249 

9*4 

29 

299 

11-0 

31 

349 

13-6 

39 

399 

8*8 

27 

250 

7-4 

17 

300 

10-2 

30 


8-7 

21 

400 

10*2 

31 

, 
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In the measiirenient of the length of 400 ear-heads, "we find that a wide degree 
of variation exists and that a continuons series of measurenients is obtained rang- 
ing from one extreme to the other. In this example, the smallest ear-head measures 
5-4 cms. and the largest measures 13-7 cms. We, therefore, have a range of 5*3 
to 13*7 cms. which can be divided into 17 classes by taking class intervals of 0 5 
cm. The mid-point of each class is taken as the class value and the freq[uency 
of the class consists of the number of variates which fall within the Hmits of the 


class. 


Table III 

Frequency distribution showing the letigth of ear-heads in Pusa 12 wheat 


Classes 

Class value 

Frequency 

Frequency X 
Class value 

1 

2 

3 

4 

5-3— 5-7 

5*5 

3 

16-5 

5-8--6-2 

6-0 

1 

6-0 

6-3— 6-7 

6-5 

8 

52-0 

6'8— 7-2 

7-0 

6 

42-0 

7-3— 7-7 

7-6 

8 

60-0 

7-8— 8-2 

8-0 

11 

88-0 

8-3— 8-7 

8-5 

32 

272-0 

8-8— 9-2 

9-0 

42 

378-0 

9-3— 9-7 

9-6 

58 

651-0 

9 8- -10-2 

10-0 

65 

650-0 

10-3— 10-7 

10-5 

65 

571-5 

10-8— 11-2 

11-0 

37 

407-0 

11-3— 11-7 

11-5 

31 

356-5 

11-8— 12-2 

12-0 

24 

288-0 

12-3— 12-7 

12-5 

7 

87-5 

12-8— 13-2 

13-0 

6 

78-0 

13-3— 13-7 

13-5 

6 

81-0 

Total 


400 

3991-0 


The extreme classes contain very few variates, i.e., their frequencies are very _ 
low, and in the middle classes the variates are more numerous, i.e., the frequencies 
are high. The student must exercise caution in classifying those variates whose 
values fall about the limits of each class-range. For instance, the first class includes 
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variates of 5-3 or above up to and including variates of 5*7, variates of 5’8 fall into 
the second class and . similarly variates of 6-3 fall into tbe third class. 

■ . Class intbeval 

* ** 

In choosing a class interval certain important considerations should be borne 
in mind. • 

(1) The class interval must be of uniform width and of such ske that the cha- 

racteristic features of the distribution are displayed. Thus, the class 
interval must -not be so large that a considerable' error would be in- 
volved in assuming that the mid-point of the interval is the . average 
of the class. It must not be- so- small as to give classes with zero fre- 
quencies, or frequencies approaching zero. 

(2) The range of the classes should cover the entire. range of the group and 

the classes must be continuous. ' ‘ - 

(3) As a general rule, the number of classes should, be about 15 and never more 

than thirty not less than six. . , • ' ' 

(4) It is a convenience to make the mid-point of a class a whole number. 

Graphic rbpeesbntation of frequency distributions • 

The information contained in Table III can also be expfes.sed as a graph and 
indeed this method permits of a ready grasp of certain important features which 
are common to some types of frequency distributions. The graph is obtained 
by plotting class values as abscissie and class frequencies as ordinates. A curve 
is then obtained in which the maximum frequencies are at the middle of the range 
and the class frequencies diminish more or less symmetrically in the direction of 
the extremes (Fig. 1). 



Ji'ig. I , — Freqmmy curve of length of ear~heads of Pusa 12 wheat in a sample of 400 1 , 
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Another graphical method of depicting frequency distributions is to measure 
along the horizontal axis distances proportional to the class intervals and to raise' 
on each of these distances rectangles proportional in height to the number of in- 
dividuals falling within the class ; the resulting figure is called a histogram. The 
two methods are practically equivalent when the class intervals are equal, as they 
always are, in the simple statisticarproblems with' which this book deals. 



Length of Ears 


rif/. '1. — Hidr,<jr'Vtn. sJ/otPruij the dlslrib'tilimi of the- length of ear-limd of Pvsa 12 irheitf. 


Aveeagbs 

From the fi'cquency table certain biometrical constants can be calculated which 
summarize the main featnres of tl\e data. 

( I) Mean. The mean is the arithmetic average and is the result obtained when 
the sum of the mcasure7uents of tlic items in a sample is divided by tlie number 
of items in the siimple. When a frequency table is available the mean can be cal- 
culated from the formula, 

^f.V 

M = --- — ( 1 ) 

where/ == the class frequency, V — the class value, n = the size of the sample and 
S indicates the summation of tlie products of all values of/ and V. 

Substituting in this fornmhi the values in the example in Table III, we obtain 

3991-0 

=9-9775 cms. 


M 


400 


The mean in this case, therefore, is 9*9775 cms. 
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The mean is simply an average and tells ns nothing of the distribution, relative 
size or number of the items. Thus each of the following series : — 

7 7 7 7 7 

5 6 7 8 9 

4 5 6 8 9 10 

11223 5 35 
2 12 

has 7 as an arithmetic average. 

The mean is useful because it gives weight to all items in direct proportion to 
their size and lends -itself to algebraic treatment. Thus the averages of two or 
more series may be obtained from the averages of the individual series and the 
algebraic sum of 'plus and 'minus deviations from the mean is zero. 

(2) Weighted 'mean. It sometimes happens that some determinations of 
the mean are made under more favourable conditions than others and are, therefore, 
esteemed as of greater reliability. More importance is attached to such determi- 
nations by considering each observation to be equal to at least 2 or 3 ordinary de- 
terminations. That is to say, that the weiglit of siudi a. determinatioji of the mean 
is equal to 2 or 3 determinatiors. 

(3) Mode. The mode is the size of that variate wliicli occurs most frecpiently. 
In a frequency table the modal class is the class which has the greatest frequency. 
This class can be determined at once from inspection, but the true value of the mode 
will be located somewhere in that class interval, not necessarily at the mid-point 
of the class. 

(4) Median. The median is the value which is located in the middle of a series 
when the items are arranged in order of magnitude and wliich divides the series 
into two equal parts so far as the niiinber of items in the series is concerned. The 
determination of the median is a simple matter when tliere is an odd number of 
items in the series. Thus, if 101 items are placed in tlie order of their magnitude, 
the 61st item will be the value of the median. If there a,r<; aii even number of 
items in a series, the average of the two central values may l)c talcen as expressing 
the median. In a frequency distribution such a.s we have described for wheat 
in which the curve expressing tlie distribution approaches a. symmetrical form, the 
median will be the value of that ordinate which divides the curve into two equal 
areas. 

(5) Range. The total range of a distribution is given by the difference between 
the smallest and the largest variates and is an indication of tlie dispersion or varia- 
bility of the distribution. In our wheat example th<‘ range is from 5-3 eras, to 
13-7 cms. and the variation in the length of ear-heads is, tlierefore, lietween tliese 
two limits. 

Frequency curve 

The frequency graph for a series distributed almost symmetrically about the 
mean approaches more and more to a smooth curve as the number of items in the 
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series increases, i.e., as the sample gets larger. It can be shown mathematically 
that with an infinite number of items a perfectly symmetrical curve is produced 
which is representative of the successive terms of the expansion of the binomial 
{a -f hf" when a = h = 1 and n is any integer. Thus — 

(a + 6)1 = 1 1 

(a + 6)2 = 1 -j- 2 + 1 

(a + 6)^ = l + 4+ 6 + 4+ l 

(a + 6)« = 1 + 6 + 16 + 20 + 15 + 6 + 1 

{a + 6)10 = 1 + 10 _i- 46 + 120 + 210 + 262 + 210 + 120 + 46 +10 + 1. 

Such a perfectly symmetrical curve is called the normal curve, and in it the mean, 
the median and the mode coincide and divide the curve into two equal areas, half 
the total number of items being in each half of the curve. 



fig. 3 . — TH normal curve and curves for values of (a + bf^. 


B 
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The slope of the curve is au indication of the amount of variability in the sample 
and the width at the point of greatest breadth indicates the range of variability. 
The steeper the slope of the curve, the less the amount of variation in the sample. 
The ordinates which divide each half of the total area of the curve into two equal 
parts are called the quartiles and with the median divide the curve into four equal 
areas, that is to say, in a perfectly symmetrical distribution the median, with which 
the mean coincides, and the quartiles divide the items into four numerically equal 
groups. The steeper the slope of the curve, the shorter is the distance between a 
quartile and the mean, and hence the distance between the quartile and the mean 
may he used as a measure of variability. 




FiQ* 4 * NoTMctl ciiTves showing incuns and QwHiles, 


Kg. 4 above illustrates the relationship of the mean and the quartiles and the 
slope of the curves in two curves of similar area. Curve on the right hand 
represents a sample with a larger amount of variability than the curve on the left. 
In each curve the distance 

MQ = MQ', 

CSirves based on adequate samples from a homogeneous population are generally 
unimodal. If the curve shows more than one mode, this is an indication of lack 
of uniformity in the sample. In the case of the measurements of a biological vari- 
able such as length of ear-head in wheat, the presence of a multimodal curve would 
lead us to infer that the sample had been taken from a mixture of different types 
or that the sample was inadequate in size. Of course, no curve based on measure- 
ments of a biological variable will exhibit the perfect symmetry of the normal curve 
but if the samphng has been adequate and the population is uniform, the departure 
from symmetry is not such as to preclude the application of the statistical principles 
of the normal curve to the solution of such problems. When ciirves are not sym- 
metrical, the mean and the mode will not coincide, and such curves are said to 
show skewness. 
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CHAPTER III 

MEASUREMENT OF DISPERSION 

The mea.n of a sample is a measure of the type which constitutes the popula- 
tion, it tells us, however, nothing as to the extent and nature of the variability or 
dispersion within the type, nor do the mode and the median by themselves give 
us any estimation of dispersion. In any sample which follows a normal distribu- 
tion, the variates are dispersed about the mean in a more or less symmetrical man- 
ner. The limits of dispersion are marked by the smallest and the largest variates 
and give us the range of the variability. As already explained, the slope of the 
frequency curve furnishes an indication of the degree and the nature of dispersion 
hut since curves based on actual samples are never absolutely symmetrical, the mean 
and the median in such curves do not coincide and the quartiles in such curves are 
not equidistant from the median and therefore do not afford a satisfactory index 
of variability ; moreover, they are reckoned with reference to the median and not 
to the mean and it is usual and preferable in actual samples to calculate the dis- 
persion about the mean. For these reasons, the measure of variability in use is 
not the quartile but is the standard demotion, a function which treats deviations 
above and below the mean on the same terms. The standard deviation is a mea- 
sure of dispersion from the arithmetic mean and is calculated by squaring the devia- 
tion of each item from the mean, summatuig the squares, dividing by the number 
of observations and then extracting the square root, as shown in the following for- 
mula : — 



where rj = the standard deviation, 
jf = class frequency, 

d = the deviation of the class value from the mean, 

S ™ the symbol for summation of all values of / X 
and n = the number of variates in the sample. 

Since both negative deviations below the mean and the positive deviations above 
the mean are squared, the products of / and are always positive numbers. 

The student must remember that the standard deviation is an absolute measure 
of dispersion and is expressed in terms of the unit of measurements, e.g., grammes, 
inches, pounds, etc. 

The term variance is used to denote the square of the standard deviation, i.e.. 



n 


A simpler measure of variabihty is afforded by the average or mean deviation. 
This is calculated by summating the deviations from the mean irrespective of the 
sign and dividing by the number of observations. The average deviation is inferior 
to the standard deviation because the squaring of deviations in the latter case gives 
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inOre adequate representation to the extreme variates which are generally of low 
frequencies. The formula for calculating the average deviation is — 

n 


( 3 ) 


The comparison of two standard deviations of different samples, however, affords 
little indication of the relative amounts of variation in the two samples. For in- 
stance, a standard deviation of 2 in a sample with a mean of 100 actually indicates 
a smaller amount of variation than a standard deviation of 0*6 in a sample with a 
mean of 10. It is obvious that the quantity 2 considered in relation to the mean 
of 100 is smaller than the quantity 0’5 in relation to the mean of 10. In comparing, 
therefore, the amounts of variation in two distributions, it is desirable to take 
account of the relative size of the items. It must also be remembered that the 
variation as measured by the standard deviation may be expressed in different units, 
e.g., grammes, inches, pounds, etc., in different distributions and, therefore, it is 
desirable to have a relative measure of variation when comparing several samples. 
It is usual to make such a comparison by means of the coefficient of variation which 
is a percentage ratio of the standard deviation to the mean, according to the 
following formula : — 

, (4) 


C. V, 


M 


X 100 


Applying this formula to our example of the length of ear-head in Pusa 12 wheat, 
we have — 

M — 9*9775 cms. 

(j = 1-4408 cms. 


therefore G. V. 


1*4408 


X 100 = 14-44 per cent. 


9-9776 

and we may say that the coefficient of variation is about 14 per cent. 

The use of the coefficient of variation, as a more reliable indication of the amount 
of variability than the standard deviation, is shown in tlie following example of 
the average yields of B. S. 1 oats in two successive years at Karnal in plots of equal 
area with five replications. 

Year 


1931 - 32 

1932 - 33 


Mexuiii 

lb. 

31-80 

66-90 


8td. dev. 
lb. 

7-627 

7-838 


Coeff. var. 
/o 

23-67 

14-02 


The values of the two standard deviations do not differ widely but considered 
in relation to their respective means it is evident that the amount of variability 
in the second year as measured by the coefficient of variation is much smaller than 
in the first year. 

Methods of calculating mean, standard deviation and coefficient of varia- 
tion 

(1) The ordinary method . — This method is sometimes called the long method 
and is simply the straightforward application of the principles described in the 
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preceding pages. Table IV gives tlie details of tbe calculations for tbe number 
of grains per ear in Pusa 12 wheat. The data from Table II are first arranged in 
a frequency table as in the previous example of the length of ear-heads ; in the 
case of the number of grains per ear we are dealing with a discrete variable since 
fractions of a grain cannot occur, the class Hmits wiU, therefore, run 8-12, 13-17, 

18-22, and there is little possibility of variates which occur near 

the class limits, being wrongly classified. In Table IV, columns 1, 2, 3 and 4 give 
the data necessary for the calculation of the mean and are the same as Table III ; 
columns 6-7 deal with the deviation, d, of each class value from the mean and the 
summation of the products of f,d^, the standard deviation is calculated from the 
formula already given but a correction factor has to be apphed to compensate for 
the error introduced by grouping variates into classes and basing the calculations 
on the values of class centres. The reason for this correction, which is called Shep- 
pard’s Correction^ is that the class centres may not be mid-points of the distributions 
of the variates within the classes and yet this is the assumption which is made when 
we group the variates into classes, and base our calculations on the class centres 
withou^ correction. Sheppard’s correction for standard deviations 


is -^-gth of the/class interval and is to be deducted from the term 


S f.d'^ 




as IS 


n 


shown in the following example where Sheppard’s correction amounts to 2*0833. 


Tarle IV 


Calculation of mean and standard deviation of the Number of grains per ear-head in 

Pusa 12 wheat 


Class 

Class value 

Eroquency 

Frequency 

X 

Class value 

Deviation 
from, the 
mean 

Deviatioii 

sq_uared 

Frequency X 
Deviation 
squared 

0 

V 

/ 

/•« 

d 



1 

2 


4 

6 

6 

7 

8—12 . 

10 

1 

10 

— 20-56 

422-3026 

422-3026 

13—17 . 

15 

17 

266 

— 16-55 

241-8025 

4110-6426 

18—22 . 

20 

25 

500 

— 10-65 

111-3025 

2782-6626 

23—27 . 

25 

86 

2160 

— 6-56 

30-8025 

2649-0160 

28—32 . 

30 

126 

3760 

—0-65 

0-3025 

37-8125 

33—37 . 

36 

77 

2695 

-1-4-45 

19-8026 

1624-7926 

38—42 . 

40 

56 

2200 

-f9-45 

89-3026 

4911-6376 

43 ---47 . . 

45 

9 

405 

14-46 

208-8026 

1879-2226 

48—52 . 

60 

4 

200 

-1- 19-46 

378-3025 

1613-2100 

63—67 . 

55 

1 

65 

H- 24-46 

597-8025 

697-8026 

Total 

• • 

400 

12220 



20429-0000 
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M 


S f.v _ 
n 


12220 

400 


30-66 


V- 

v' 


n 


( 12 *0 


where i — class iufcorval 


C. V. = 


20429 

400 

- X 100 


— 2-0833 = 6-9992 
6-9992 


X 100 = 22-911 per cent of the meaii. 


M '' 30-66 

Tlie frequency curve of the distribution of number of grains per ear-head is 
shown in Fig. 5. 



10 !5 20 25 30 35 40 45 50 55 


Number of Grains 

jFigr. — FTc>(jtiG%cy cvtvg of the 7 ivo^yit>CT of (iT(ii%s cctT-liGcidi vti Fusci 12 wheat* 

(2) The short method.— Tae long and tedious calculation of the preceding example 
may be considerably shortened by the use of the short inetliod illiisti ated in 1 ables 
V and VI. In this method a convenient assumed value, which must agree with 
one of the class values, is taken as the mean and is referred to under the letters 
A, 0* naeaning arbitrary origin. The deviations, d' from this arbitrary origin are 
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taken as shown in column 4 in Table V without reference to the class values and are 
expressed as units of class intervals ; thus the deviation, d' , of the last class is given 
as 9 because this class is 9 class-intervals removed from the class which contains 
the A. O. For this last class the frequency is 1 and therefore / = 1 X (9)^. 

The products f.d' and f.d'^ are given in columns 5 and 6. Since the calculations 
have been based not on the true mean but upon the arbitrary origin and since the 
deviations are expressed as units of class intervals, corrections have to be applied 
to the arbitrary origin to determine the mean and to the usual formula to deter- 
mine the standard deviation. Thus — 

M = A. O. + i ) (6) 

and. = y ^ J X of ). . (6) 

A word of caution seems necessary in applying Sheppard’s correction. This 
correction is 33 th of the class interval squared and must invariably be deducted 
from the variance after the necessary corrections have been made as shown in 
Formula 6. 

In this example the class value of the first class has been taken as the arbitrary 
origin and all deviations, d', are positive, this, however, is not necessary and a 
class value in the middle or in any other part of the distribution may be taken as 
the arbitrary origin, in which case the lower classes will give negative deviations 
and the higher classes will give positive deviations. This is done in Table VI which 
gives the calculations of mean and standard deviation for the length of ear-head 
in Pusa 12 wheat by the short method. 

Table V 


Godculcdion of mean and standard deviation of the number of grains per ear-head in 

Pusa 12 wheat 


Class 

Class value 

Frequency 

Deviations 

from 

arbitrary 

origin 

Frequency 

X 

Deviation 

Frequency X 
Deviation 
squared 

C 

V 

/ 

d' 

f.d' 

f.d'^ 

1 

2 

3 

4 

5 

6 

8—12 

10 

1 

0 

0 

0 

13 — 17 

16 

17 

1 

17 

17 

18 — 22 

20 

26 

2 

60 

100 

23 — 27 

25 

86 

3 

268 

774 

28 — 32 

30 

125 

4 

600 

2000 

33 — 37 

35 

77 

5 


1925 

38' — — ~4:2 . , 

40 

55 

6 

330 

1980 

^3 — 17 . , 

46 

9 

7 


441 

48 — 52 

50 

4 

8 

32 

266 

63 — 57 

55 

1 

9 

9 

81 

Total 

• • 

400 


1644 

7674 
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i = 6 
A. O. = 10 

Correction factor = — ■ — 

n 

M = 0. -h i X 

n 


10 4- 5 X 


1644 

400 


30-55 


G. V. 


S f.d' ^ 


V 




7674 

400 


(w)* } X - (4- ) 


6-9992 


M 


X 100 


6-9992 X 100 
30-66 


22-911 per coat. 


Table VI 


Calculation of mean and standard deviation of the length of ear-head in Pusa 12 wheat 


Class 

Class valuo 

Froquoncy 

Deviations 

from 

arbitrary 

origin 

Frequency 

X 

Deviation 

Frequency x 
Deviation 
squared 

G 

V 

/ 

d' 

U' 

f.d'^ 

1 

2 

3 

4 

6 

6 

5-3— 5-7 



6-6 

3 

—8 

—24 

192 

5-8— 6-2 



G-0 

1 

— 7 

—7 

49 

— (j*7 



6-6 

8 

—6 

— 48 

288 

6-8 —7-2 



7-0 

6 

—6 

—30 

150 

7-3— 7-7 



7-5 

8 

— 4 

—32 

128 

7-8— 8-2 



8-0 

11 

—3 

—33 

99 

8-3— 8-7 



8-6 

32 

—2 

—64 

128 

8-8— 9-2 



9-0 

42 

—1 

—42 

42 

9-3--9-7 



9-6 

68 

0 

0 

0 

9.8—10-2 



10-0 

66 

+ 1 

4-66 

66 

10-3—10-7 



10-6 

66 

4“ 2 

4-110 

220 

10-8—11-2 



11-0 1 

37 

+3 

4-111 

333 

11-3—11-7 



11-6 

31 

4-4 

4-124 

496 

11-8—12-2 



12-0 

24 

4-5 

4- 120 

600 

12-3—12-7 



12-5 

7 

4-6 

4-42 

262 

12-8—13-2 


* 

13-0 

6 

4-7 

4—42 

294 

13-3—13-7 



13-6 

6 

-1-8 

•4"48 

384 

Total 



400 


—280 

3720 







-1-662 








-f-382 
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0-6 cms. 
9-6 cms. 


Correction factor 




for determining true Mean. 


S f.d' 


for determining standard deviation. 


A.. 0. “}■ 4 X 


9-5 + 0-5 X 




9-9775 




of 0-25 


1-4408 


G. V. 


X 100 


1-4408 X 100 
9-9776 


14-44 per cent. 


TJie short cut methods give true values for the biometrical constants and not 
mere approximations to the results achieved by the long method. 

In the preceding examples we were deahng with large samples in which the 
data were grouped. It sometimes happens, however, that the mean and the stand- 
ard deviations have to be calculated from a few observations, in which case the 
calculation is made directly from the observed data. Thus in the following example 
of the yields of 26 plots of barley [Shaw and Bose, 1929] the calculations of these 
constants are shown below by two methods. 

Table VII 

Calculation of the standard deviation, etc., by the Assumed Mean Method 



Deviations from 
assumed mean 


Deviations 

squared 
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Table 'Yll—coM. 

Calculation of the standard deviation^ etc., by the Assumed Mean Method — contd. 


Assumed Moan — 330 


Yields ill grm. 

Deviations from 
assumed mean 

Deviations 

sq^uarod 

— 

H- 

252 

• 



• 

78 

. * 

6084 

442 

• 



. 

• • 

112 

12644 

204 





126 

. . 

15856 

208 





122 


14884 

280 





50 


2600 

278 





52 


2704 

201 





129 


16641 

231 

• 




99 


9801 

313 



• 


17 


289 

181 

m 

• 

• 


149 


22201 

312 

m 

• 

« 


18 


324 

182 

* 

• 

m 


148 


21904 

472 


• 

m 


. . 

142 

20164 

380 


• 

m 


• • 

50 

2500 

316 



m 


14 


196 

368 

• 


« 



28 

784 

308 

« 

• 

• 

• 

22 

. , 

484 



Total 

• 

—1147 

+765 

204690 


C (Correction factor) 


-1147 + 765 
25 


— 16-28 


Mean = 330 — 16-28 = 314-72 


Bxa (Error of moan) 






n 




204690 

25 


(— 16 - 28 )= 


v' 8187-60 — 2;}3-48 
V' 7954-12 = 89-19 
0-6745x89-19 


V-' 25 


12-03 
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Table VIII 

GdlculaMon of standard deviation, etc., hy the Yield Square Method, especially adapted 

for machine calculations 


Yield 

(Yield)2 

462 

204304 

362 

123904 

460 

202600 

465 

207026 

372 

138384 

262 

68644 

276 

76626 

332 

110224 

262 

63604 

442 

196364 

204 

41616 

208 

43264 

280 

78400 

278 

77284 

201 

40401 

231 

53361 

313 

97969 

181 

32761 

312 

97344 

182 

33124 

472 

222784 

380 

144400 

316 

99856 

358 

128164 

308 

94864 


Totae 7868 


2676070 
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„ 7868 

Mem = = 814’72 

io 



= a/ 107002'80 - M048'6784 = a/ 7654'122 = 89'19 

0’6745X8H9 

■ = li'Ua 

a/ 25 


Babcock, 1. B., and Clausen, R. B, (1927). Genetics in relation to Agriculture, 2iid Edition, 
pp. 170-171. 

Harper, F. H. (1930). Elements of Practical Statistics, p. 141. 

Sbaw, F. J. F., and Bose, E. D. (1929). Yield Me witb some Pusa Baileys, ign. Jm. 
Mk, 24, pp. 384-386. 

Sinnott, E. W., and Dunn, L. C. (1932). Principles of Genetics, 2nd Edition, McGraw-Hill 
Book Co., New York, pp. 366-357. 
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CHAPTEE IV 

PROBABILITY AND GOODNESS OF FIT 

Tlie normal curve represents tine distribution of an infinite series of observa- 
tions and is a perfectly smooth sy mm etrical curve in which the mean, median and 
mode coincide. In snch a curve half of the total number of variates lie on the 
side above the mean and half on the side of the curve below the mean. With the 
median the qnartiles divide the observations into four numerically equal groups 
and it follows, therefore, that one half of the total number of variates lie between 
the quartiles and one half lie beyond the range of the quartiles. In the normal 
perfectly symmetrical curve we may regard the mean value as representing the 
variate of most frequent occurrence, since mean and mode coincide in the normal • 
curve, and variates of other values may be considered as deviations from the mean. 
The normal curve, therefore, represents the probability of occurrence of a variate 
of any value which is included in the distribution, those of values close to the mean 
being more frequent in occurrence than those of values approaching the limits of 
the distribution. 

If in a normal frequency distribution we measure off on both sides of the mean 
a distance equal to the standard deviation we have an area, or range, that includes 
approximately 68 per cent of the total number of items in the distribution. The 
distance thus measured from both sides of the mean extends to points on a normal 
frequency curve where the curve changes from a concave to a convex surface, i.e., 
at the points of inflexion. 

In the case of the normal curve it can be shown mathematically that, if Q and 
Q' are the distances of the quartiles from the mean then 

Q = Q' = 0-6745 (7, 

or approximately 

3 (distance of the quartile from mean) = 2<y. 

The relationship Q = 0-6745 cr is one which the student should memorize, its 
proof is outside the scope of this book. 

In Figure 6, if M is the value of the mean and Q and Q' are the distances of the 
quartiles from the mean, then the two values M -f- Q' and iff — Q will mark the 
limits between which fifty per cent of the variates will occur. It is, therefore, an 
even chance that any single variate picked at random from the sample will be of a 
value within these limits or outside them ; but 

Q' =Q = 0-6745 (j 


therefore, it is an even chance that any single variate picked at random will fall 
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within the values M ± 0-6745 cr. The quantity 0-6745 a is called the p’ohaUe 
errar of any variate ; this simply means that the odds are as 1 : 1 that any item in 



Fig. 6 . — Normal curve showing relative positicm of mean, quartiles and various multiples of a. 


the distribution chosen at random will fall within the limits M ± 0*6745 u. In the 
normal curve the student will note that 

M ± (0-6745 (j) include 60 % of the variates 
M ± 2 (0-6745 o) include 82-3 % of the variates 
M d: 3 (0-6745 a) include 95*7 % of the variates 
ilf i 4 (0-6745 u) include 99-3 % of the variates 
If d: 5 (0-6745 cr) include 99-9 % of the variates, 

similarly the values 

ilf d; cr include 68*3 % of the variates 
If di 2 (7 include 95*5 % of the variates 
ilf d: 3 a include 99-7 % of the variates 

Consideration of one of these values will make the significance of the foregoing 
clear. Let us take the value if d: 2 (0-6745 cr). Since 82-3 per cent of the variates 
are included within these limits 17-7 per cent will lie in the portions of the curve 
outside these limits, i.e., towards the tails of the curve. The chances against the 

occurrence of a positive or negative deviation from the mean as large or larger 

than 2 (0-6745 u), therefore, are as 


82-3 : 17-7 or 
4-6:1 
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In otlier words, tte odds are 4*6 : 1 against tlie random selection in a sample of a 
variate deviating from tlie mean by 2 (0'6745 a). Tbe student should note care- 
fully the difference between the odds against the occurrence of a deviation of a parti- 
cular value and the odds against the occurrence of a variate of a particular size. 
Since we are considering a normal distribution of which 17-7 per cent of the items 
lie outside the limits If ± 2 (O' 6745 a), it is obvious that 8-86 per cent of the 
variates will lie beyond the upper value, ilf 2 (0*6745 c) and 8*85 per cent will 
He below the lower value, M — 2 (0*6745 cr) ; that is, each tail of the curve con- 
tains 8*85 per cent of the variates. 



Therefore, the odds against the occurrence of a variate as large as, or larger than, 
ilf + 2 (0*6746 c) are as 

(8*86 + 82*30) : 8*86 
or 

10*3 : 1 

and similarly the odds against the occurrence of a variate as small as or smaller 
than M — 2 (0*6746 ct) are as 

10*3 : 1. 

Further consideration of the estimation of probability from the proportions of the 
normal curve will be deferred to the chapter on the Probability Integral, 
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Probable error 

In compiling frequencies and calculating means and other statistical constants 
the first essential is that the sample should be truly representative of the universe 
from which it is taken. We endeavour to secure this by taking as large a number 
of items as can be handled, and by taking them at random. ISTo two samples, how- 
ever, will exactly agree and, therefore, the means of a number of samples will vary 
very slightly, the extent of the variation depending upon the accuracy of the sam- 
pling. How can we estimate which of the values obtained for the mean or any 
other statistical constant from a number of different samples is the most accurate ? 
An estimate of this is furnished by the quantity termed the probable error which 
can be readily calculated from the standard deviation of the constant. In practice, 
when we have determined a statistical constant, we put after it a value which repre- 
sents the limits within which it may be expected that any subsequent determina- 
tion from a similar sample will fall. This value is the probable error, an arbitrary 
term used to denote the amount which must be added to or subtracted from the 
observed value to obtain two limiting figures of which it may be said that it is an 
even chance that the true value lies within these limits. 

The probable error of the mean of a sample is calculated by dividing the probable 
error of any variate by the square root of the population of the sample. Thus, 

P,E.m = (7) 

v/ n 


The student must distinguish carefully between the probable error of the mean 
and the probable error of any variate and note that the former is a function of the 
square root of the size of the sample ; hence the importance of dealing with as 
large a sample as possible. The term, probable error, is a somewhat unfortunate 
one and is in no way to be taken as indicating the inaccuracy which is likely to occur 
in the course of an experiment ; it is not the most probable mistake. The probable 
error has in modern statistics been largely superseded by the standard error. The 
formula for the standard error is the same as that for the probable error without 

the decimal fraction 0*6745. The standard error of the mean is, therefore, 

V n 


and the relationship 2 S.E. = 3 P.E. is approximately true. 


The probable error of the standard deviation is 
very similar to the above — 


P.E. u = 


0*6745 g 
v/ 2n 


given by a formula which is 



The fractions z: and — — r have been calculated for all values of n from 1 

\/ n ■V '2.n 

to 1,000 and are published in Pearson’s Tables ; this enormously facilitates the 
calculation of probable errors. 
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Example 1. In. a sample of 400 heads of Pusa 12 wheat the mean number of 
grains per ear-head is 30*55 and the standard deviation is 7 * 11 t F-. Prom Pearson’s 
Table V : ^ 


0*6745 

A/iOO 


= 0*03372, 


therefore, P.E.j^ = 0*03372 x 6*9992 = 0*23601. 
0*6746 

Similarly, = 0*02385, 

V 400x2 


therefore, P.E. a = 0*02385 X 6*9992 = 0*16693 


Prom this example we see that the mean number of grains per ear-head is 30*65 
and to this quantity we attach a probable error i 0*23601. It is usual to express 
these facts as 

M ^ 30*55 ± 0*23601 

Prom this we understand that in any subsequent determination of the mean of 
this variable from a sample of the same size it is an even chance that the value 
obtained will lie between the limits 30*314 and 30*786. Similarly, the value of the 
standard deviation in the above example is 

(7 = 6*9992 ± 0*16693 

The probable error of the coefficient of variation is given by the formula — 

P.E. (j y = 0*6745 F X ^ 1 4- 2 ^ ^ 'I j (9) 

where V is the coefiB.cient of variation. 


As aheady explained the fraction — — “ may be obtained from Pearson’s tables 

V 2n 

for all values of n up to n = 1,000. The remainder of the expression may also be 
obtained from Pearson’s tables for all values of F up to F = 50. The calculation 
of the probable error of the coefficient of variation, therefore, simply involves the 
product of two numbers which can be obtained from these tables. The long and 
complicated term 


F X 


1 -j- 2 



is referred to in Pearson’s Table VI by the symbol (]j. 

Example 2. The coefficient of variation of the number of grains per ear-head 
in Pusa 12 wheat is 



X 100 



PLANT BREEDING AND AGRICULTURAL PROBI.EMS. 


6-9992 
*' 30-5600 

= 22-911 


X 100 


For the calculation of tlie probable error we find from tbe Tables that 

0-6745 

— : ^ 0-02386 

V 400x2 

and when V = 22-911, then =24-0836, 
therefore P.U. = 0-02385 X 24*0836 = 0-6744, 
i.e., C. V. = 22-911 ± 0*5744 

Probable error of a difference . — Probable errors not only indicate the degree of 
confidence which may be placed in a result but are also useful for estimating the 
significance of a difference between two similar results. The probable error of the 
difference of two results depends on the mathematical theory of least squares and 
is the square root of the sum of the squares of the probable errors of the two results. 
Thus 

Ea = + Pf' --(lO) 


where and Eq are the two probable errors of two results and E^ is the probable 
error of the difference of these two results. 

Example 3. Two determinations of the mean lengths of ear-head in Pusa 4 
wheat, from similar samples, gave 

M = 8-95 d:; 0-04 cm. 
and M = 7-88 i 0-04 cm. 
the difference between these two means is 1*07 cm. 


and Ea = i/(0-04)2 _|_ (o-04)2 = 0-057, 

so that the difference is 1-07 rb 0-057 cm. and we see that the difference is approxi- 
mately 18 times its probable error. 

The probable error is a function of cr, therefore, 18 times the probable error can be 
expressed as a multiple of cr and we have seen that when a deviation is expressed as 
a multiple of (j we can estimate the chances of its occurrence, therefore, it is also 
possible to estimate the probability of a deviation which is expressed in terms of the 
probable error. Statisticians have adopted a standard chat deviations of three or 
more times the probable error are significant of a real difference between the 
samples, such as would not be likely to occur by the operation of chance alone. In 
example 3 the observed deviation of 18 times the probable error is certainly signi- 
ficant of a real difference between the two samples, such as would be unlikely from 

0 2 ■ 
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the results of chance errors. A table showing the probability of occurrence of 
deviations of different magnitudes relative to the probable error is given below : — 


Table IX 

Probability of occurrence of statistical deviations of different magnitudes relative to the 

jorobable error 


Deviation 
divided by 
probable 
error 

Probable occur- 
rence of a 
deviation as 
great as or 
greater than the 
expected one, 
expressed as a 
percentage 

Odds against 
the occurrence 
of a deviation 
as great as or 
greater than the 
expected one 

Deviation 
divided by 
probable 
error 

Probable occur- 
rence of a 
deviation as 
great as or 
greater than the 
expected one, 
expressed as a 
percentage 

Odds against 
the occurrence 
of a deviation 
as great as or 
greater than the 
expected one 

1-0 

50-00 

1-00 

1 

3-1 

3-65 

26-40 : 1 

M 

45-81 

1-18 

1 

3-2 

3-09 

31-36 : 1 

1-2 

41-83 

1-39 

1 

3-3 

2-60 

37-46 : 1 

1-3 

38-06 

1-63 

1 

3-4 

2-18 

44-87 : 1 

1-4 

34-60 

1-90 

1 

3-6 

1-82 

63-95 ; 1 

1-6 

31-17 

2-21 

1 

3-6 

1-62 

64-79 : 1 

1-6 

28-05 

2-57 

1 

3-7 

1-26 

78-37 : 1 

1-7 

25-15 

2-98 

1 

3-8 

1-04 

96-15 : 1 

1*8 

22-47 

3-45 

1 

3-9 

0-863 

116-23 ; 1 

1-9 

20-00 

4-00 

1 

4-0 

0-698 

142-26 : 1 

2-0 

17-73 

4:’64 

1 

4-1 

0-669 

174-75 : 1 

21 

15-67 

5-38 

1 

4-2 

0-461 

215-92 : 1 

2*2 

13-78 

6-26 

1 

4-3 

0-373 

267-10 : 1 

2-3 

12-08 

7-28 

1 

4-4 

0-300 

332-33 : 1 

2-4 

10-55 

8-48 

1 

4-5 

0-240 

416-67 : 1 

2-5 

9-18 

9-89 

1 

4-6 

0-192 

619-83 : 1 

2-6 

7-95 

11-58 

1 

4-7 

0-152 

666-89 : 1 

2-7 

0-8G 

13-58 

1 

4-8 

0-121 

826-46 : 1 

2-8 

5-90 

16-96 

1 

4-9 

0-096 

1,061-63 : 1 

2-9 

5-05 

18-80 

1 

6-0 

0-074 

1,360-36 : 1 

3-0 

4-30 

1 

22-26 

1 

6-0 

0-0052 

19,230-00 : 1 
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TI 1 U 8 from fclie above table we see that when 

Dev. 

= 4 

P.E. 

the probable occurrence in a hundred trials of a deviation as large as or larger than 
that observed is 0*698 and therefore the odds against the occurrence of such a devia- 
tion being due to chance alone are 142 : 1. Jk.e.-n. 

If we are confronted by a number of means and their probable errors than the 
probable error of the average of these means is given by the formula — 


P.E. of an average of means = — + . . . -f- (H) 

where N = the number of separate means and a,h, c, represent the separate 

probable errors. 

Example 4. The average length of ears of Pusa 4 wheat during the period 
1926-26 to 1931-32 was 


Year 


Moiiu length 
of ear in 
eras. 


l926-2(> 

1926- 27 

1927- 28 

1928- 29 

1929- 30 

1930- 31 

1931- 32 


Average of 7 years = 8‘28 


8-14 ± 0-04 
8-19 zh 0-03 
8-65 ± 0-03 
8-33 ± 0-03 

7- 83 ± 0-03 

8- 96 ± 0-04 
7-88 ± 0-04 


P.E. of the average 

~ v/^(0^4y2 -f (0-03)2 -1- (O-OS)*'^ + (^03)2+ + (0-04)‘‘^ 

= 0-0131 


That is the average length of Pusa 4 wheat in these 7 years was 8-28 i 0-0131 cms. 


Probable error of an observed probability 

Two or more events are mutually exc-liisive if the occurrence, of one excludes 
the occurrence of the other. Thus, if a coin is tossed upon a flat table it will fall 
in one of two ways— either head up or tail iip and botli ways are equally likely and 
with a single coin both ways cannot occur at one throw. The two events are, there- 
fore, mutually exclusive, and the probability of each event is 0-5, since there are 
only two possibilities, heads or tails, and one of them must occur. If p is the 
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probability that an event will occur and q tbe probability that it will not occur, 
expressed as decimal fractions, then in the case of two mutually exclusive events 

( 12 ). 

If n be the total number of times of occurrence then the probable error of the pro- 
bability is given by 

jS: = 0-6745 / (13). 

^ V n 

This formula is sometimes of use in testing the accuracy of Mendelian ratios. 
Example 5. In a F 2 population of 1,100 chilHes we observed 

Purple : Non-purple 
831 : 269 

r., . , , . . 831 269 

Therefore, the observed ratio is ~ 0*756 : 0-244. 


The theoretical expectation of the ratio is 0-75 : 0-25, therefore, the deviation of the 
observed from the expected ratio is 0-006 and the probable error of the probability 
is given by 


E^ = 0-6746 


/ 0-75 X 0-25 __ 
V 1100 ~ 


The ratio of deviation to the probable error is 


0-0060 

0-0088 


=a-6s. 


From Table IX, 


we see that when this ratio isQ-63the odds against, the occurrence of a deviation as 
great as or greater than the observed are o®ly=2=57::F4='; we, therefore, conclude that 
the observed ratio is not significantly different from the theoretical expectation. 

The closeness of agreement between observation and theory in the case of Men- 
delian population is generally determined directly from the class frequencies in 
which case the above formula is sfightly modified, the probable error of a class 
frequency being given by 


0-6746 ./ p. q. n (14) 

Applying this formula to example 5 we get. 


E^= 0-6745 / 0-75 x 0-25 x 1100 
= 9-6845 


The frequencies are 


Purple : Non-purple 


Observed 

Expected on 3 : 1 . 


Deviation = 


6 and 


Dev. 


(> 

y-6845 


= 0-62 


831 : 269 
825 ; 275 



i*LANT BEKKPIN*! AND AiJlUtHfi/rUEAL fEDH 


ill tIuK cjiHc iht* m Ivm than tiui probahlo trrroi' and ia such 

an a rhanrt* rrror «f‘ t he cxfHU’imt'nt. 

A w<trd «>f <-aut i«ni in ncei'SHary in the tise of these formulae. T-..^ 
valid wlieii ladi ht'r p nor q is very small, and in ni<»r»; c.om plicated Mend(5 
such as : I, one of tliese terniH w«m!d la* represimted !>y ,i‘,bli, in vvhieli 
UHeh'ss to a|»p!y t hest* formulae unli'SH a is vi‘ry larj^t*.. 

.Mor«‘ovcr, it is iheorefci<mlIy unjust ifiahle to t(^st tlai vali<lit.y ol a giv 
tin* determination of t he probable error ol one or all ol; its individual v 
groups ; the random deviations of the. class tre(pn‘neit‘S art*, not indepeot 
eorn*iated. 


(tOOi)MHHH OI-’ FIT 


A j’riterion of tlse goodness ol fitr ol obsorvaithms to expej'.taliions, whie.h is not 
0|>en ta) tin; above, ol>j(*et ions, is the appHeation ol wlnit is known as tlie Ohi-s<piaro 


test. X"' is ealendaied from t in; following lormula 


'/• 


(f> ay 

1 j " 




X. 


(p-c) 


'X- 


where 0 ™ the ohs(‘rved fiH'rpnun’.y of a elass, 

(J “ the eadculated fretpKme.y of a clasH; 

and itnlieaies summation of tin; term for all ela,ss<‘s in tin; ratio. 

'I'hi.s metiiod has the adva,nta,ge tiini, it. gives a. nn*asnre of the. goodne.ss of lit 
of tlie raiio as a wlioie and allows of the class fre<itn‘neieH whudi (h'via-te. most 

.(O (i)“ 

from expectation being delennine(l at a, glance Ironi tin* vaba* of 


d’he distribnti(m of eorresi)onding to the number {n) ol classes in lin* dis- 
tribution, was w<>rkf‘d iint by i'enrsim and a table* was (amstiueted (lyidi*! t,<ui) lilluW” 
ing the probability ol an observe**! value* ol /.^be'iiig give'ii by a. laiuleun H<inipi<'. fiorn 
:i hvpot hi'tieal poinilat ie»n. d'liat is to say, frean tin* value* eif /.“ we* e*a,n eleeluce 
fin* preibabilit V t iiat rainleun sainjiling wemlel l**ael to as large a ele'viat ieut or a lurg**r 
eh'vialietn belAv****n the**»rv and e)b.H**rvat i*»n. Thus, witli 18 classes and a, eale*nla,teel 
value* e>f ■/“ 10, we* tiln! fre.ni tin* tab!.* tied P tKtOb and *>em<-bnle tiiai , in 00 

*'a.He*.s end of heunln**! trials, tin* eliane** **rrors *>f rainban saiujiling vv*ml*l give* a 
*i**viatiem as large as *>r larger than t hat ol)se*rve*el ; we* may, 1 imre'fore*, say tiial, tin; 
agre*e‘nn*nt b**iwe‘e‘n olisiTvat i*ui ainl lin'ory in such a eiase* is e*xee*lle*nt . 


A 
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Example 6. In the Fg population of a cross between type 22 linseed which 
has a blue petal and type 12 which has a white petal, the following frequencies 
were observed [Shaw, Khan and Alam, 1931] : — 



Pbequbnct 


I Expected on 

Observed 9 A3 : 3 . i 



{O — CV 


G 


Blue like Fj 
Blue like type 22 
White like type 12 
White 


217-6875 

72-6626 


72-5626 


24-1876 


-12-6876 

7-4376 

3-4375 

1-8126 



0-7395 

0-7623 

0-1628 

0-1368 


1-8004 


Referring to Pearson’s Table XII we find that when w = 4 and X — 1, P is 
0-8013 and when = 2, P is 0*5724, by interpolation we find the value of P corre- 
sponding to = 1*80 is 0-6182. We conclude, therefore, that in about 62 cases in 
a hundred trials, chance errors of random sampling would give deviations as large 
ai^those observed and thait the fit of observation to theory is satisfactory. 

^''Fisher, in his Table III, has published the table in a slightly different form in 
which values of P corresponding to values of X^ smaller than one are tabulated and 
in which values of X^ corresponding to specially selected values of P are given. 
'Fisher’s Table is entered with n equal to one less than the number of classes in the 
distribution, this value is known as the degrees of freedom and is explained in a 
later chapter. 

Example 7. In the Fg population of a cross between type 3 chilli which is 
purple ill colour and type 29 which is green, the following frequencies were observed 
[Deshpande, 1933] 



Frequbkoy 


I Expected on 
Observed 1 A 3 ; 8 : 4 


Purple, deep 
Purple, medium 
Purple, light 


Green 



68-76 

206-26 


560-00 


276-00 


—3-25 


4-13-00 



(U — 6 ')- 
C 


0-2045 

0-0512 

0-3073 

0-1309 

0-6939 
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From Fisher’s Tables, entering with — 3, we find that this value of lies 
between P = 0-90 and P = 0*80 and we, therefore, conclude that the fit is good. 


The probability that two given samples belong to the same universe 


In many cases of statistical investigation, we have to deal with samples drawn 
from different places and it is necessary to decide whether they belong to the same 
universe or not. Suppose that we have two samples of spores of the same species 
of fungus from two different localities. The frequency distribution of length in 
each sample may show differences between the two samples in respect to this 
character and it is necessary to be able to decide whether the observed difference 
is of such an order as to preclude the two samples belonging to the same universe. 

A fair idea of the nature of the discrepancy between two given frequency dis- 
tributions can be had by comparing the means, the standard deviations and other 
statistical constants of the two samples in question. Let and m^, g^, 

be the means, the standard deviations and the sizes of the two samples. If the 
differences between the means or the standard deviations are not significant, we 
say that the two samples belong to the same population, and if all the other 
statistical constants such as coefdcient of variability, etc., also do not show any 
significant difference then the chance of two samples belonging to the same 
population is greater still. 

It is known that the mean, the standard deviation and other statistical constants 
of a frequency distribution summarise the most important properties of the dis- 
tribution. But an index which takes into account the differences in frequencies 
of two samples for the same interval is bound to afford a better estimate for a com- 
parison between them. In the X^ test our object is to see whether a frequency 
distribution differs widely from a theoretical one. The expected values of the 
different class intervals are calculated and an index based upon the differences of 
the expected and actual frequencies for the respective class intervals enables us to 
determine the probability that the given distribution can be represented by a known 
theoretical distribution. The use of the X^ table has been fully explained in the 
previous section of this chapter. 

By using a principle similar to the one explained above, Pearson [BiomeLnka, 
Vol. VIII, 1911] showed that the probability of two samples of sizes N and N' 
belonging to the same population, can be determined from “ X^ ” Table by evaluating 


NN' V 

N N' In 


I /rb, where p, 


_ . fs + f r 

N -h N' 



and /j', /o', fp' fs' ^re the frequencies of the samples for corresponding 

class intervals. But later he finds [BiometriJm, Vol, XXIV, 1932] that p^ is the 
probability that any selection will fall in the ,,th class frequency as indicated by 
the two samples together. If the samples are considerable in size, the assumed 
value of Ps is correct. The biologist is generally dealing with small samples and 
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Pearson has found [1932] tliat the best way to estimate the value of the probability 
that the two samples belong to the same universe is by evaluating 


NN' I ff, 

N + N' ^ \N 





...{WA) 


where N and N ' = the size of the two samples and fg and the frequencies o| 

the two samples for the corresponding class intervals. Knowing the value of 1C 
for any two particular distributions, the maximum probability of the two samples 
belonging to the same population can be determined from X table. 

Table X gives the frequency distribution of the length of each of the two 
hundred spores of Peshawar and Karnal strains of Tilletia indica. 


Table X 


Frequeyicy distributions of the length of scores 


Spore length [ji. 

Frequency of T. indioa 

Karnal strain 

Peshawar strain 

u 

26 

2 

0 

28 

4 

3 

31 

20 

4 

34 

24 

81 

37 

66 

91 

40 

43 

3 

43 

28 

14 

40 

6 

4 

4!) 

C 

0 

.02 

0 

0 

56 

2 

0 

Total 

200 

200 


N 

N' 

1 


The means, the standard deviations and the coefficient of variation of the 
measurements of the two strains of spores are given in Table XI. 


Table XI 


Biometrical constants of the measurements of sjsorcs 



Length 

. IN [1 

Ratio of 

Constants 

Peshawar 

Karnal 

jJLii h EHEN Ohj 

TO S. ERROR 


strain 

strain 

OF DIFFERJ5HCE 

Means • . • • “ . • 

36-176 

37-990 

4-6105 

Standard deviation . , , . . . 

2-9024 

4-7508 

6-6394 

Coefficient of variation . , . . - 

8-0232 

12-5054 

5-9564 




I’f.AKT :l5Klfi,Klli1N(J AND AtUftlCUi/TOiaAl 


A <‘<>m|»nrisoH uf Iht* mt'aiw, tiu' HtaiKliird deviatioiw aiui t 
li'tiiN uftlir two .strainH in respect to length ahows tliat the ( 
are stnt i.stieaUy signiliciiiit- and Itenci; the prolaibility that the. im 
t ile Siuue groitp or popuiaiion is Htuall. 

Now li‘t us apply the method devclopial hy Karl PearHuu to the 
t'ltr this wt‘ have to evaimito the expre^aiou 

.ViV' 

N “f" 

In tldH particular ease A* - - A’' 


■/; 


~ '5-)r 


2(H) 


fietiee the <‘.xpre,SHion redueee to 

The iiata are shown in deteil in Tal>b X 11. 


Table XII 


I' t'i iptt lU'ij of Kartml and Pt'shnu'ur slmiHs of Tillel'ia iiidicii 


i 

l''lUvVttKN<iV 

iU' 'J\ 


ieiiMtli (i i 

Kill iMiI Himiu J\ 

IN'Mliawiir Hi min // 

//.“ /// 

1 

«» 

a 

4 

**#< D 

4Mr 

0 

a 


4 


1 

:n 

20 

•t 

Ki 

:n 

111 

HI 

07 

' * j 

nn 

\ 

Ul 1 


ii> 1 

I 

VA 

H 

40 

n 


M 

14 


o 

4 

1 


il 

i> 

(i 

*)2 

0 

0 

0 

hh 

' 2 

0 

>» 


Algehraic HUin oi' column -I Ktl 



iumI ilie prolmhility lor '/“ <i7-2'l, n 11 is •tXMHM) sliovving that, the. two 

wtriiiiiH iH'loug to entirel}’ ditlerent populationH, 
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The method indicated above holds good when we are dealing only with onc 
single character, viz., length in this case. But when a sample has got a number of 
characters such as length, breadth, height, etc., a rough way of testing the univer- 
sality of the two samples is to evaluate the expression 

N + N'*- In n'>^ 

as mentioned above, and if for every one of the characters the probability of the two 
samples being taken from the same population is high, then we can safely assert 
that the samples belong to one and the same universe. But this need not be the 
case in the statistical problems that frequently occur in biological work. The pro- 
bability for two of the Characters might be high and it might be low for a third 
character. For such cases Karl Pearson has developed another index known as 
the Coefidcient of Racial Likeness based upon all the characters in consideration 
[vide Biometrika, VoL XVIII]. This has been further developed by Professor P. C. 
Mahalanobis in his article on Tests and Measures of Group Divergence, pub- 
lished in the Journal and Proceedings of the Asiatic Society of Bengal, Vol. XXVI, 
1930, No. 4. The student interested in further study of such problems is referred 
to the above mentioned original articles. 


Linkage values 

In Mendelian studies it frequently happens that the inlieritancc of two pair's 
of characters does not conform to the normal expectation but tliat certain combina- 
tion of characters occur more frequently than should be the case according to the 
ordinary expectations of independent segregation. In such cases characters are 
said to be linked together. Although a description of the methods and significance 
of linkage in inheritance is outside the scope of this book, yet since various mathe- 
matical methods of calculating the strength or intensity of linkage from observed 
phenotypic frequencies in Fg have been evolved, a few examples of the calculation 
of linkage values are included here. 

The intensity of linkage between two characters can obviously be estimated 
from the ratio between the number of occasions on which they occur together to 
the number of occasions on which they occur separately, or in other words, the 
probability of the two characters occurring separately, expressed as a percentage, 
will be an estimate of the linkage value. The separation of two characters which 
are usually linked together depends upon the crossing over from one chromosome 
to another of a gene carrying one of the characters, hence the percentage probalhlity 
of the separation of two characters is generally called the cross-over value. 

Example 8. A few methods of calculating linkage values may be illustrated 
from an observed case of linkage in linseed. Linseed type 1 1 has a deep lilac petal 
and a deep purple stigma and linseed type 121 has a lilac petal and a white stigma 
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[Shaw et. al., 1931]. In Fj the type 121 phenotype was dominant and in Fg the 
following frequencies were observed : — 


Observed 

Expected on 9 : 3 : 3 : 1 . 


Lilac petal 

Deep lilac petal 

White 

Purple 

White 

Purple 

stigma 

stigma 

stigma 

stigma 

AB 

Ab 

(iB 

ab 

357 

37 

33 

94 

203*04 

97*68 

97*68 

32*66 


From an inspection of these frequencies it is obvious that the parental com- 
binations occur more frequently than should be the case on a 9 : 3 : 3 ; 1 ratio, 
but when either the petal or the stigma characters are taken separately a good 
3 : 1 fit is obtained, e.g., white stigmas 390, purple stigmas 131. 

Additive method. This method of calculating the linkage from observed pheno- 
typic frequencies in Fg is, as its name implies, based upon the summation of class 
frequencies. A simple formula is that of Emerson’s — 


r 


E— M 


n 


(16) 


where E — Sum of the frequencies of the two end classes (the double dominant and 
the double recessive) 

M — Sum of the frequencies of the two middle classes (single dominants) 
n = Total population 

and 1 — jp = the percentage of crossing-over. 

Applying this formula to the data of linseed above, we get 

, __ (367 -f 94) — (33 4- 37) __ 381 __ 

^ 621 521 


p = yo-73 == 0*854 
l—p = 1 — 0*854 = 0*140, 

therefore, the cross-over value is 14*6 per cent, which represents the chance of 
occurrence of the gametes in which the linkage has broken down. The gametic 
ratio, therefore, is — 

AB Ab aB ah 

42*7 7*3 7*3 42*7 

or approximately 6 116 

Product ratio msthod. This method, as its name implies, depends upon tlie ratio 
of the product of the two end classes to tliat of the two middle classes. 

p= (17) 

ho 

where a = the frequency of the double dominant class 
d = the frequency of the double recessive class 
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and h and c = tlie frequency of the single dominant classes 


_ P + 1 — a/3P + 1 

” P — 1 


(18) 


Applying these formulae to the data from linseed, we get 

P = = 27-48 

33 X 37 

, 27-48 + 1 — v^(3 X 27-48) + 1 

^ “ 27-48 — 1 

= 0-728. 

The value of agrees with that obtained by the additive method. 

Another method of calculating linkage values is based upon the square root of 
the proportional frequency of the double recessive. Since the double recessive is, 
in the normal dihybrid ratio, represented by only one individual it is obvious that 
the square root of the proportional frequency, expressed as a decimal fraction, of 
the double recessive, will give directly the probabihty of the occurrence of the double 
recessive gamete. Table XIII shows how the data are arranged for calculation by 
this method ; — 


Table XIII 


Commutation oflinhage value from data of flower colour in linseed tyme 

121 X typ^ dl 


Phenotypes 

1^2 

fre- 

quency 

Observed 

propor- 

tions 

9 ; 3 : 3 : 1 
propor- 
tions 

Deviation 

7\dj listed 
propor- 
tions 

Calculated 

fre- 

quency 

1 

2 

3 

4 

5 

6 

7 

Lilac petal and 
white stigma. 

367 

0-6852 

0-5625 

0-1227 

0-6828 ' 

355-74 

Lilac petal and 
purple stigma. 

37 

0-0710 

0-1875 

0-1165 

0-0672 

35-01 

Deep lilac petal 
and white 

stigma. 

33 

0-0633 

0-1875 

0-1242 

0-0072 

36-01 

Deep lilac petal 
and purple 

stigma. 

94 

0-1804 

0-0625 

0-1179 

0-1828 

95-24 


Average deviation . 0*1203 


The adjusted proportions in column 6 of the above table are obtained by adding 
the average deviation to the first and last terms and subtracting it from the two 
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middle terms in column 4 wMch. contains tlie normal proportions. Th 
of occurrence of the double recessive gamete will be the square root of 

proportionate occurrence of the double recessive phenotype. In this c 
= 0-42755, therefore, the frequency of the double recessive gamete is / per cent, 
a result which agrees with that previously calculated. The calculated frequencies 
in column 7 are obtained by multiplying the observed frequencies in the various 
classes by their respective adjusted proportions and dividing this product by the 
observed proportions. Thus, in the last phenotypic class, the calculated fre- 


. 94 x 0-1828 

quency is or 95-24. 

^ 0-1804 


A full description of the methods of computing linkage values has recently been 
published by Alam [1929]. 
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CHAPTER V 
PROBABILITY INTEGRAL 

If we consider the diagram (fig. 4) of a normal curve, we see that the upper quartile 
divides the curve into two areas, 75 per cent of the total area of the curve being 
on the left of the quartile and 25 per cent on its right. If M is the value of the mean 
and £c represents the deviation of the upper quartile from the mean, then the value 
ojthefflidina^ at the upper quartile is M-\-x. All items in the distribution of a 
size as large as or larger than M x will occur in the portion of the curve included 
in the smaller area, which is 25 per cent of the total area enclosed by the curve. 
Since this smaller area is to the remaining area of the curve in the proportion of 
25 : 75, it is clear that the odds against the occurrence of an item as large as M -f- ® 
or larger will be as 3 : 1. The student should particularly note that the odds are 
quoted against the occurrence of a particular value, the chance of occurrence of a 
value oi M X, or larger, is of course, as 1 : 3. 

In this particular example the deviation from the mean, x, is equal to the distance 
of the upper quartile from the mean and we have already shown that a deviation of 
this size is equal to 0*6745 <y. It follows, therefore, that in this case 

X = 0*6745 <7 

and — = 0*6745. 

CT 

We infer, therefore, that in a frequency distribution which follows the normal curve, 
when the deviation, x, from the mean divided by o gives a quotient of 0-6745 that 
the ordinate erected at x will divide the area of the curve in the ratio of 3 : 1. It 
is possible to calculate the proportions into which the area of the normal curve will 
be divided by ordinates erected at any deviations from the mean, these deviations 

being expressed in terms of a. In otlicr words, the ratio — will for normal distribu- 

<7 

tions bear a definite relation to the proportions into which the curve is divided by 
the ordinate erected at the deviations M Mathematicians have, for the normal 

curve, constructed tables giving for all values of _ the proportionate areas cut off 

<y 

by the ordinates at those points ; such tables are called tables of the probability 

integrals. Hence, for any value of — where x is the deviation from the mean, we can 

cr 

estimate the chances of occurrence of a deviation of this size or greater. Pearson’s 

Table II (Vol. I) gives all values of — from 0 to 6*00 and the ratio of the corresponding 

(7 

areas into which the curve is divided. The larger area of the two areas into which 

00 

the curve is divided by the ordinate — is given in the column I (I + a), and the 

a 

smaller area, is determined by subtracting this from 1. The probability integral 
may be defined as the proportion of the area lying under the curve between the 
lower extremity and the ordinate at any given value. 



PLANT BREEDING AND AGRICULTURAL PROBLEMS. 


43 


Example 9. The mean height in a sample of 1,000 maize plants is 64-64 inches 
with a standard deviation of 2-7. What are the chances of the occurrence of a 
plant 70-04 inches or more ? 

X — 70*04 — 64-64 = 5-4 inches. 


X 

G 


54 

2-7 


= 2 


From Pearson’s Table II, we find that when the value — is 2, the area of the curve 

a 

cut off by the ordinate at the deviation + 5-4 from the mean is 0-9772 of the whole 
area of the curve. The curve, therefore, is divided into two parts in the ratio of 
0-9772 : (1—0-9772) or 97-72 : 2-28. The chances of occurrence of a plant of 70-04 
inches or larger are about two and a quarter in a hundred. In other words, the 
odds against the occurrence of a variate 70-04 inches or larger are given by the ratio 

9772 : 228 


or approximately 43 : 1 against the occurrence. 

If we are considering the chances of occurrence of individuals deviating from the 
mean by 5-4 inches or more, the odds will be different, since in this case we are 
dealing with deviations both above and below the mean ; that is to say, we are 
considering the chances of occurrence of individuals of 70-04 inches or more and of 
59-24 inches or less. Figure 8 will make this clear ; the deviation x, 5-4 inches, has 



A 

59-24 


Fig. 8 . — Probability ivUgral 


D 
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a positive value on the right hand side of the mean and a negative value on the left 
hand side of the mean, the ordinates erected at B- and marking respectively the 
positive and negative values of the deviation. 

Now we have seen that the ordinate at/iSi divides the area of the curve into two 
areas in the ratios of 0'9772 ; 0-0228 and similarly the ordinate at A will also divide 
the total area of the curve in these same proportions. It follows, therefore, that 
each tail of the curve, that is to say the smaller areas lying respectively to the right 
and left of the ordinates at B >5, is 0-0228 of the whole area, and the proportion 
of the total area between the ordinates at A and B is 1— (2x0‘0228) or 0‘9544, 
The odds against the occurrence, therefore, of a variate differing from the mean by 
6-4: inches ^ill be the ratio of the area of the curve between the ordinates at A 
and B to the sum of the areas of the two tails, that is to say as 0-9544 : 0-0456 or 
approximately as 21 : 1. The ordinates at A and B cut off two tails which are 
approximately 5 per cent of the total area of the curve. 

The student must distinguish carefully between problems which involve the area 
of one tail of the curve relative to the remainder and problems which involve the 
sum of the areas of both tails relative to the central area. 

Table XIV shows for a few values of ~ the relative proportions of the areas into 

or 

which the curve is divided by the ordinate at x and the odds against the occurrence 

of a deviation of x. Thus, when — = 1, the sum of the areas in both tails of 

C7 

the curve is 0*31732 and the area lying between the tails is 0-68268 and the odds 
against the occurrence of a deviation of x are as 68268 : 31732 or approximately 
as 68 : 32 which is about 2-12 : 1. 


Table XIV 


Showing the greater 'portion of the area of a normal curve of errors to one side of 
• cc 

an ordinate at the abscissa — together with its relation to P {value of two tails) 

G 


X 

0 

Greater 
fraction 
of area 

Area in 
one 
tail 

Area in 
two 

tails or P 

Odds against ^ , 

occurrence of 

deviate 723^ 

<r- 

0 

0-50000 

0-60000 

1-00000 

0 

: 100 

• * 


0-1 . 

0-63983 

0-46017 

0-92034 

8 

: 92 

0-087 

: 1 

0-2 . 

0-57926 

0-42074 

0-84148 

16 

: 84 

0-190 

: 1 

0-3 . 

0-61791 

0-38209 

0-76418 

24 

: 76 

0-316 

.- 1 

0-4 . 

0-65642 

0-34468 

0-68916 

31 

: 69 

0-449 

: 1 

0-6 . 

0-69146 

0-30854 

0-61708 

38 

: 62 

0-613 

: 1 
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X 

a 

Greater 
fraction 
of area 

Area in 
on© 
tail 

Area in 
two 

tails or P 

Odds against 
ocouireaco of a ± 
deviate 

0-6 . 

0-72576 

0-27425 

0-54860 

45 : 55 

0-818: 1 

0-7 , 

0-75804 

0-24196 

0-48392 

62 : 48 

1-083 : 1 

0-8 . 

0-78814 

0-21186 

0-42372 

58 : 42 

1-381 : 1 

0-9 . 

0-81594 

0-18406 

0-36812 

63 : 37 

1-703 ; 1 

1-0 . 

0-84134 

0-15866 

0-31732 

68 : 32 

2-12 : 1 

1-1 . 

0-86433 

0-13567 

0-27134 

73 : 27 

2-71 : 1 

1-2 . 

0-88493 

0-11507 

0-23014 

77 : 23 

3-35 ; 1 

1-3 . 

0-90320 

0-09680 

0-19300 

81 : 19 

4-26 : 1 

1-4 . 

0-91924 

0-08076 

0-16152 

84 : 16 

6-25 : 1 

1-6 . 

0-93319 

0-06681 

0-13362 

87 ; 13 

6-69 : 1 

1-6 . 

0-94520 

0-05480 

0-10960 

89 : 11 

8-09 : 1 

1-7 . 

0-95543 

0-04457 

0-08914 

91 : 9 

10-10 : 1 

1-8 . 

0-96407 

0-03593 

0-07186 

93 : 7 

13-28 : 1 

1-9 . 

0-97128 

0-02872 

0-05744 

94 : 6 

15-67 : 1 

2-0 . 

0-97725 

0-02275 

0-04550 

95 : 5 

19-0 : 1 

2-1 . 

0-98214 

0-01786 

0-03572 

96 : 4 

24-0 : 1 

2-2 . 

0-98610 

0-01390 

0-02780 

97 : 3 

32-33 : 1 

2-:} . 

0-98928 

0-01072 

0-02144 

98 : 2 

49-0 : 1 

2-4 . 

0-99180 

0-00820 

0-01640 

98 : 2 

49-0 : 1 

2-5 , 

0-99379 

0-00621 

0-01242 

99 : 1 

99-0 : 1 


The following example "will serve to familiarise the student with the use of the 
probability integral table. 


Example 10. The mean length of oar-head in a sample of 400 ears of Pusa 12 
wheat is 9-9775 cms. with a standard deviation of 1-4408 cms. What are the chances 
of the occurrence of : 


(1) a head of 12-1282 cms. or more, 

(2) a head of 6-5364 cms, or less, 

and (3) a head deviating from the mean by ^ 2-58084 


(1) — = 2-1507 
(y 1-4408 




cms->? It will be seen that 


p % 


1-4927. 
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From table of probability integral J (1 -f oc) = 0-93224. 
Area in one tail = 0-06776. 

Odds approximately 93 : 7 or 14 : 1 against. 


( 2 ) - = 

c 1-4408 


= 2-3883. 


Area in one tail = 0-00846. 


Odds approximately 992 : 8 or 121 : 1 against. 

(3) - = = 1-7915 

o 1-4408 


i (1 -f a) = 0-96335 
Area in two tails = 0*07330 


Area between tbe tails = 0*9267 or (1 — 0*07330) 
Odds approximately 927 : 7^ or 12*7 : 1 against. 


The probability integral is tbe fundamental basis for a large part of statistical 
work. The examples whicH we kave dealt with up to now bave been taken from 
samples wbicb are relatively large and wbicb closely approximate to a normal 
distribution. Tbe special methods necessary for dealing with small samples in wbicb 
tbe distribution may depart from tbe normal are dealt with in subsequent chapters. 


REFERENCES 
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CHAPTER VI 


SIGNIFICANCE OF DIFFERENCES BETWEEN MEANS 


One of tlie most frequent problems in. biological work is the comparison of the 
mean values of different samples. We may, for instance, determine the mean length 
of ear-heads in different varieties of wheat and attempt to estimate the significance 
of the differences between the various means, or a more common problem is the 
determination of the significance of the differences between the mean yields of 
different varieties. In any branch of experimental science it is usual to repeat a 
particular experiment several times and to form an estimate of the reliability of the 
result from a comparison of the amount of agreement between the repetitions ; such 
an estimate may be furnished by the size of the difference between the largest and 
smallest determination relative to the size of the determinations themselves. This 
practice is sufficient in experimental work in physical and chemical sciences in 
which the conditions of the experiment are more rigidly under control than in biologi- 
cal work and in which, therefore, the fluctuations between the results of repetitions 
of the same experiment are generally insignificant. In experimental work in plant 
breeding and agriculture, the material of the experiment is a living plant subject to 
the uncontrolled fluctuations of soil and climate and the comparison of experimental 
results must be made by the application of rigid statistical tests based on representa- 
tive samples. 

It is theoretically possible to take an infinite number of samples of the same size 
from an infinite population, and to determine the mean of each sample ; these means 
will differ slightly from each other provided n is sufficiently large. If the samples 
follow a normal distribution then these means will also be distributed normally about 
the average of all the means, and the standard deviation of this cmve will represent 
the variabihty present in the universe. The larger the number of samples taken the 
more this curve, which is called the sampling distribution of the mean, will approxi- 
mate to the normal smooth curve, and mathematicians have established that such 


a curve has a standard deviation of — ru where (j is the standard deviation of the 

a / n 

universe and n is the size of the sample. This means that if a quantity is normally 
distributed with standard deviation a, then the means of samples containing n 


items arc normally distributed with standard deviation 


This quantity 


G 


V n 's/ n 

is called the STANDARD ERROR and can be easily determined for any sample of 
size n if the or of the population is known. We have shown in our consideration of 
the probability integral that the probability of occurrence of a given deviation from 

the mean can be estimated by the ratio in quantities which 

standard deviation 
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follow a normal distribution. Since — i::: is tbe standard deviation of tbe sampling 

v/ n 

distribution of means, tbe significance of tbe difierence between tbe mean of a sample 
and tbe true mean of tbe universe, wbicb is represented by tbe average of tbe means 
of all samples, can be estimated by 

deviation 

or 

-v/ n 

In actual practice tbe true mean of tbe universe is a bypotbetical quantity which 
we do not know and our comparison is always between tbe means of difierent sam- 
ples. If we take a number of difierent saniples from a normal population and take 
tbe differences between pairs of means then it can be shown mathematically 
that such differences also follow a normal distribution and that for any two means 
tbe standard error of tbe difference between them is given by 

S. E.^)^ + {S. (19) 

Tbe ratio of difference to the standard error of the difference, — - — , can then be 

’8.E.: 

used to determine tbe significance of tbe observed deviation. 

The details will become clear from tbe actual working of examples. 


Example 11 


Comparison of length of ear-head in Pusa 4 wheat in two successive years 


Year 

No. of 
plants 

Mean 

length 

Standard 

deviation 

Standard error of 
mean 

1931-32 

» • 

• 

• 

400 

7-88 cm. 

1-09 

1-09 = 0-0545 

1932-33 

• • 

• 

* 

400 

7-82 cm. 

0-90 

0-90 -r- = 0-0450 


DiflEerence in mean length of ear-heads = 0-06 cms. 


Standard error of diifei*ence = ± a/ ( 0-0646)^ -f- {0-0460)- 


= ± 0-07068 


Therefore 


difference 

standard error of difference 


0-06 

0-07068 

0-8489 


The difference, being smaller than tbe standard error of tbe differencei is not 
statistically significant ; the difference in the mean length of ear-beads in the two 
samples is such as would occur due to chance. 
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Example 12 


Comparison of mean heights in two varieties of oats grown under identical conditions 


Variety 

No, of 

Moan 

Standard 

Standard error of 

plants 

height 

cleviatioB 

xOiCaB 

Scotch Potato oats 

921 

104-76 cm. 

8-82 

8-82 -f- v '921 = 0-2906 

B. S. 2 oats .... 

970 

150-21 cm. 

5-17 

5-17 = 0-1669 


Difiereneo in mean heights = 45-45 cm. 


Standard error of difference = ± V(0-2906)2 -f 

= ± 0-3346 

Therefore = 135-9. 

standard error of difference 0*3346 

The difference being many times the standard error of the difference, we conclude 
that B. S. 2 oats are really taller than Scotch Potato oats. It is necessary to explain 
the size of the ratio between the mean difference and its standard error which is 
considered to indicate significance. We have seen that in the normal curve (page 
25) a deviation from the mean of includes approximately 95 per cent of the 

items in the sample, in other words, only 5 times in a hundred cases will items differ-- 
ing from the mean by :di2cr or more be realized. This may be expressed by saying 
that the prol)ability of a de viate exceeding 2 g is 5 per cent or P = 0*05. Consi- 
dering example 9, we see that the ordinates erected at 2<s cut off from the curve 
two tails the sum of whose area is 5 per cent of the whole area of the curve, this 
makes clear the reason for adopting a measure of significance of twice the standard 
error or as it is called the 0-05 level of significance. This criterion is generally 
adopted by statisticians and differences between means of -lees than 2cr are considered / 
to be sucli as would occur 5 times in a hundred from the operation of the chance 
errors of the exporiinont. In tliis book, particularly in the section on yield trials, 
we generally adopt a stricter criterion of P = 0-01, which means that only once 
in a hundred repetitions of the experiment would chance errors of random 
sampling give a deviation as large as or larger than that observed. 

Bessel’s method— 

The determination of the significance of the difference between two means by 
the use of tlic formula 

E, = s/'eJTI^^ 

involves the same principle whether the biometrical constant used is the standard 

cr 

error or the probable error. Since, however, the standard error is — and the 

V n 

probable error is 0-6745 — zi.j it follows, therefore, that the relationship 

s/n 

2S,E. = ^P. E. 


( 20 ) 
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approximately liolds good and a deviation of twice tlie standard error lias, tlierefoie, 
the same significance as three times the probable error. The method of determining 
significance by the probable errors and the theory of least squares is known as Bessel’s 
method and the criterion of significance generally adopted is that the ratio of the 
mean difference to the probable error of the difierence should be more than 3-2 (see 
page 30). The method is reliable when the sample is large and when there is no 
correlation between the different observations as in many chemical and physical 
determinations, but it is less reliable in its application to field trials where there 
may be a high correlation between the yielding powers of adjacent plots. We shall 
revert to this later in our consideration of yield trials. 


SiG-NIFICANCE IN SMALL SAMPLES 

In the cases which we have considered up to now it has always been possible to 
secure a sample of such a size that, given an adequate method of sampling, the sample 
should be truly representative of the universe. While it is possible to handle a sam- 
ple of many hundred measurements of a quantitative character, there are other 
classes of work in which the conditions of the experiment compel us to be satisfied 
with relatively few observations. It is obvious, for instance, that in a field experi- 
ment the number of plots which can be handled is limited and therefore for this 
type of experiment we have to develop a statistical theory which will allow ol satis- 
factory comparisons being based on small samples. We have seen that in the case 
of large samples the means of a number of samples from an universe are distributed 

normally with a standard deviation of — pzr ; if M^l is the deviation of a sample 

V n 

mean from the mean of the universe, it follows that 

G 

n 

is also distributed normally. In practice, we do not know c of the universe and we 
have to substitute for it s, the standard deviation of the sample, and it is on 
account of this that our tlieory lias to be modified in the case of small samples. 
When n is large, the quantity 

M, 

s 

\/ n 

approximates in its distriluition sufficiently nearly to the quantity 

Ma 
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to allow of a criterion of twice the standard error being taken as a measure of the 
5 per cent level of significance. But when, n is small, the quantity 


s 

\/ n 

is not sufficiently normal in its distribution, to allow of the use of this criterion. 

Skident’ft “ z An anonymous investigatoT’, ‘ Student lias worlccd out the 
distrilnition of a quantity 

■ ^ 
s 

n.nd has constructed a table giving for different values of 7% the probability integrals 

of this quantity This quantity is called Student’s z and is generally 

s ' 

expressed as 



where M,/ — 

s = standard deviation of the d i fferem ^ te-ef f 1 vo -two- s a mp l e s. , 

= sum of squares of the deviations from the mean, 
n — size of the sample. 

Student’s table of z is available in its original form {BiomelriJm Vol, VI, page 
19, and in Pearson’s Tables Vol. I, Table XXV, page 3G) and as modified by Love 
{Jour. A'tner. Soc. Agrooi. Vol. XVI, jiage 08). From these tables we can read for 
any values of z and n the probable significance of the observed difference. 


Degrees oe freedom 


With small samples the best estimate of variability is found by dividing the total 
sum of squares of deviations not by the number of observations (w) but by the degrees 
of freedom, that is, one less than the number of observations {n — 1). The formula, 
therefore, for the standard deviation becomes in the case of small samples : — 


s ~ 



( 22 ) 


A simple example will illustrate the difference which this modifioation makes, 

Exa'mfle 13. In Table XV, the yields of Pusa barley type 21 as obtained from 
50 small plots are arranged in groups of 5 plots. 


*** Whose work has formed the basis of the study of statistics of small samples. 
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Table XV 


Yield of Pusa barley type 21 in 50 small -plots arranged in groups of 5 plots each 


Group 

I 

II 

III 

IV 

V 

Mean 

1 . 

380 

618 

497 

751 

396 

528-4 

2 . 

352 

323 

333 

188 

385 

316-2 

3 . 

366 

344 

544 

307 

416 

395-4 

4 . 

633 

630 

231 

361 

280 

427-0 

5 . 

575 

487 

298 

510 

507 

476-4 

6 , 

274 

546 

464 

623 

636 

508-6 

7 . 

438 

295 

268 

291 

388 

336-0 

8 . 

418 

372 

270 

293 

238 

318-2 

9 . 

345 

344 

340 

372 

435 

367-2 

10 . 

478 

377 

585 

465 

605 

502-0 


The general mean of all tlie 50 plots is 417-44 grms. and with n = 50, the value 
of the standard deviation, 


/ 824174-32 
V 50 


128-39. 


If, however, the appropriate degrees of freedom are used as the divisor instead of 
the actual number of observations, we get 

/ 824174-32 

5 = / = 129-69. 

V 49 

It is evident, therefore, that with a sample of 50 observations, in this case, the differ- 
ence between the values of the standard deviations determined by the use of n or 
n—1 is not very large. 

Considering now the means of all the 10 groups, of 5 plots each, into which the 
yields of 50 plots of barley were arbitrarily divided and taking the sum of squares of 
deviations from the sample mean in each group, we obtain 

= 60963-8240, 

as the sum of squares of deviations of all the 10 groups. Since each sample has 5 
observations, the total degrees of freedom are 10 (5—1) = 40 and the calculation 
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of the standard deviation using 10 samples of 6 plots each becomes 




which gives a fairly close agreement to the value 128-39 calculated on the basis of 
the number of observations (n). If when dealing with the squares of deviation from 
the sample mean we had taken as the divisor I'i = 50 and not n = 10(5 — 1) = 40, 


we should have got 


y 60963-8240 
60 


110-42, 


a value which departs widely from 128-39, the standard deviation of the single large 
sample. Obviously in the case of the small samples of 6 each the use of degrees of 
freedom has given us a value of s closer to the real value. 

Fisher’s ‘ t’. A more recent development of statistical theory for dealing with 
significance in small samples has been made by Fisher, who has constructed a table 
of the probability integral of a quantity which he calls ‘ t This quantity differs 
from Student’s ‘ 2 ; ’ in that the mean difference is considered in relation to the stand- 
ard error and not to the standard deviation. Moreover, the standard deviation 
is calculated not from the size of the sample (n) but from the size of the sample less 
1, i. e., (ri— -1) — this number is called the degrees of freedom. The formula for 
Fisher’s ‘t’ is 



\/ n 

In using Fisher’s table of " i ’ we enter the table with the degrees of freedom (n— 1) 
and not with the number of observations {n) as in the case of Student’s table of 
‘ z ’. The ‘ t ’ table gives for degrees of freedom up to 30 and 00 the probability 
that an observed value of ' t’ will occur as the result of chance errors. If 
that probability is low, that is P = 0-01 to 0-05, we conclude that the observed 
difference, d, is statistically significant. If, how^ever, the probability observed 
from the table is higher than 0-05, we conclude that the chance errors of the experi- 
ment would be liable to give a difference of the order of magnitude of that observed. 
Thus, when the degrees of freedom are 7 and the observed value of ‘ i ’ is 3-25, we see 
from the table that the probability P lies between 0-01 and 0-02 ; that is to say, 
only once or twice in a hundred trials would a value of ‘ i ’ of this size result by chance. 
On the other hand, with 7 degrees of freedom and ‘ i ’ = 0-6 the value of P lies 
between 0-60 and 0-60 denoting that 50 or 60 times in a hundred trial such a value 
of ‘ i ’ would occur by chance alone. Obviously, therefore, with these degrees of 
freedom a value of ‘ i ’ of 3-25 or more shows that the observed difference is signi- 
ficant and a value of ‘ i ’ of about 0-6 indicates that the observed difference is not 
significant. 
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The importance of the size of the sample in relation to the reliability of the 
results based, on a sample is brought out well by the curve (Fig. 9) which shows the 
decrease in the value o£‘ t’ as the nmnber of degrees of freedom increases. 

The value oi‘ t’ decreases very rapidly at the P=0-01 level up to about 5 degrees 
of freedom ; with more than 6 degrees of freedom the decrease in the value of ' T 
is very small and we should, therefore, distrust the reliability of results based on less 
than samples of five observations. 

The value of the standard deviation .&£=tb£= d iff(» re-nee calculated on the basis 
of the number of observations (n) can be readily converted into that calculated on a 
basis of degrees of freedom by multiplying by the fraction ^ n 


n 


In comparing results by Fisher’s ‘ i’, we must distinguish between cases in which 
it is legitimate to pair observations and take differences and cases in which observa- 
tions cannot be paired. When the two sets of observations to be compared relate 
to samples under identical conditions, except for the variable under study, then 
the procedure outlined above and illustrated below in Example 14 can be followed. 
Thus, in the case of yield trials, when we are comparing two treatments in contiguous 
plots in the same field the yields may be taken in pairs, since the experiment is so 
designed that the factors affecting the plots are equal in their effect on 
and the differences between the treatments will not be disturbed thereby. In such a 
case the degrees of freedom are one less than the number of pairs of observations, 
that is one less than the number of differences. 

It may be, however, that the data are derived from two samples which are inde- 
pendent and which are not related to one another in the sense of being under uniform 
conditions and in which, therefore, the treatments cannot be paired. In this case 
the best estimate of the standard deviation which we can make from two samples 
is given by 



therefore, in this case 



. (25) 


where and are the two means and iVi and ^ the sizes of the two samples and 
s has been calculated by the above formula. In this case the degrees of freedom 
are 


(iVi-l) + (2V,-1) 
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and we enter the ‘ t ’ table accordingly, that is to say we enter the ‘ t ’ table with 
the sum of the degrees of freedom for the two samples. 

Mahalanobis has published a modification of the ‘ t ’ table which he calls the ‘ f ’ 
table and which may be used for the comparison of unrelated samples. The use of 
this table is explained at page 62. 

Comparison of related samples 

A few examples will make clear the use of Student’s ‘ z ’ and Fisher’s ‘ i ’ in 
determining the significance of mean differences. 

Example 14. Comparison of yields of two varieties of gram with 6 replications. 

Table XVI 


Yields of gra?n, types 17 and 51 


Repli cations 

Yield of 
type 17 

A 

Yield of 
type 51 

B 

Differences 

(1 

(B-A) 

<VK 

1 

. 

20-50 

24-86 

+4-36 

19-009 

2 


24-60 

26-39 

+ 1-79 

3-204 

» 


23-06 

28-19 

+5-13 

26-317 

4 

• 

29-98 

30-75 

+0-77 

0-593 

5 

« 

30-37 

29-98 

—0-39 

0-152 

G 


23-83 

22-04 

-1-79 

3-204 

Total 


162-34 

162-21 

+9-87 

62-479 

Moans 

• 

25-39 

27-035 

+ 1-645 



Student’s ‘ z ’ test — 

9-87 

Mean difference = = 1-645 lb. 

6 

We now proceed to calculate the standard deviation using the number of observa- 
tions (n = 6) as the divisor and applying the method described at page 22. In the 
present example since the frequency of each observation is unity,/ — 1 and the 
formula for standard deviation is 



pf. fornaulae (2) and (6), 


( 26 ) 
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We obtain, therefore, 


/ 52-4:79 

V 6 

Mean difference 



1- 645 

2- 458 


= 2-458 
= 0-669 


From Pearson’s Table XXV, we see that 

when n = 6 and z = 0-6 P = 0-88129 and 
when n = 6 and 2 — 0-7 P — 0-91085 


In the present case, therefore, since z — 0-67, the probability P will lie betweeri 
0-88 and 0-91. Taking P as equal taO^ we see that the odds are 90 : 10 or 9 ; 1 
against the difference as great as^the observed difference occurring due to chance 
alone. Such odds, however, are not sufficiently high to indicate a real difference in 
yielding power between the two varieties. Love has published tables which give 
directly the odds that an observed difference is significant. From Love’s Table we 
see that in the present example 

when n = 6 and z = 0-65 the odds are 8-62 : 1 and 
when w = 6 and z = 0-70 the odds are 10-2 : 1, 

a result which agrees substantially with the above calculation. 


Fisher’s ‘ t ’ test — 


In applying this test we must remember that the standard deviation of the differ- 
ence must be calculated on the basis of the degrees of freedom. As already explained 
this could be done by multiplying the value of the standard deviation obtained for 


Student’s ‘ z’ by the fraction / ^ In this case, therefore, 

n—1 


-s = 2-458 X 


y— 

V 6-1 


2-6925 


_ d __ 1-645 

~ 2-6925 

■\/ n 6 


== 1-4962 


Entering Fisher’s ‘ t ’ table with 5 degrees of freedom, we find that our observed 
value oi ‘ t \ viz. 1-496, is very close to the value 1-476 which lies at the P = 0-2 level 
of significance, this means that approximately 20 ^me s^ in a hundred repetitions of 
the experiment would a value of ‘ ^ ’ as large as/^the observed value result from 
chance alone, and the odds therefore are only^^L: 1 against the observed difference 
being due to chance alone. These odds are approximately half those given by the 
application of Student’s ‘ z ’ method. This is because in calculating probability 
Student’s ‘ 2 ’ table considers only one tail of the curve (P — 0-90 and one tail = 
1 — 0-90 = 0-10). Actually a difference of the observed magnitude might be 
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either positive or negative and therefore the probability of occurrence of a value of 
‘ z ’ equal to 0-67 is given by 2(1 — 0*90) = 0’20, which agrees with P = 0*2 given 
by fche ‘i’ method. Pisher’s table of ‘it’, therefore, considers positive and 
negative differences and takes account of both tails of the curve [Tippet, 1931]. 


Critical difference . — Since Fisher’s ‘ i ’ is 



s 


\/ n 


then for any value of ‘ t ’ corresponding to a definite level of probability wo have a 
significant or critical difference 

d = t y. S. E (27) 


In the present example the standard error {S. E.) is 


S. E. = 


2-6925 

\/6 


1-099. 


With 5 degrees of freedom at the 1 per cent level of probability, t = 4-032. There- 
fore at the P = 0-01 level of significance the critical difference ‘ d ’ = 4-032 X 1*099 
= 4-4312 but our observed difference is only 1-645 and this being less than the critical 
difference is not significant at the 1 per cent level. Similarly at the P = 0-05 level, 
we find that the critical difference is 2-826 and therefore the observed difference, 
1-645, is not significant even at the 5 per cent level. 

It is convenient, particularly in experiments such as yield trials, to consider one 
variable as the control and to express the observed differences as a percentage of the 
mean of the control. The critical difference may then also be expressed as a per- 
centage of the mean of the control and the significance of the observed difference 
estimated on a percentage basis. In this experiment, type 17 is taken as control 
and its mean yield is 25-39 lb. The observed mean difference between the two types 
in the experiment is 1-645 lb., therefore — 


1-645 x 100 

the percentage difference = = 6*48 per cent 


and the percentage critical difference at P = 
per cent. 


0-01 level = 


4-4329 X 100 
25-39 


17-46 


The observed percentage difference, being less than the percentage critical difference, 
is not statistically significant at the P = 0-01 level. 


Example 15. Improvement in milk yields by special treatment of cows. 

Experiments in the agricultural section at Pusa have been carried out to determine 
the effect of special handling and feeding on milk yields in the Sahiwal herd. A 
group of cows were subjected to special treatment and the milk yields under this 
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treatment was compared with previous best lactations of the same cows, Tho 
following data are taken from the results : — 

Table XVII 


Average daily milJc yield in lbs. per oow 


Identity No. of the cow 

Average milk yield per 
day in lbs. 

Difference 

d 


Under special 
treatment 

For previous 
beat lactation 
under normal 
treatment 

555 


* 

0 

16*3 

16*1 

0-2 

0 « 

5G2 

• 

• 

0 

24*4 

16-2 

8-2 


564 

• 

0 

0 

22-0 

6*1 

16-9 


566 

• 

# 

• 

29*2 

16-6 

12-6 


575 

m 

• 

* 

26-6 

8-8 

16-8 


577 

• 

« 

• 

19*6 

6*8 

12*7 


586 


0 

• 

18-1 

9*9 

8-2 


589 

0 

m 

0 

20*2 

10*6 

9*6 


591 

* 

m 

* 

26*1 

9*8 

16-3 


597 

m 

• 


26-8 

12-2 

14-6 



Total 

• 

• • 

• ♦ 

114-1 

1529-03 


Students ‘ z ’ test . — ^Mean difference = 114*1 10 = 11*41 lbs. 


5 = v' 1629*03 10 — - (11*41)2 4.766 


z 


11*41 

4*766 


2*394. 


From Pearson’s Table XXV, we find that the appropriate value of the probability 
for this value of z is 0*99997. Therefore, the odds are 99997 : 3 against a difference 
of the magnitude of the observed difference being due to chance. In other words, 
we infer that the increased milk yield is produced by the special treatments of the 
cows. 

Fisher’s ‘ t ’ test . — ^Mean difference = 114*1 -a- 10 = 11*41 lbs. 


s 


S9 



n 

n — 1 


4*766 X 1*0636 — 6*0210 
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/S'. E.-- 


t 


5-0210 

\/W 

d 


S, E. 


1-588 

- 

" i-“ 


7-1701. 


Expected value, of ‘ / ’ for P = 0-01 and ii ~~ 0 in 0-250. 'I’lierefore, 1;h«» t)l>H<*rvi‘d 
value of ‘if viz.^ 7-176 1 which in much than tin* <*xj)e<*.te<i value (d* 0-250 

indicates very high Mignificance in favour of the yields obtained l/y t he spec-ial treat- 
ments of the cows. 


Expressing the oI)aerv(‘d mt'an diherenee aiul tlie (srit-ie^al diOhrence as peremit ageH 
of the mean of the control, *.<•.>., the mean of the pD^vious liest hu*tjd io!t, we obtain -- 

the percentage ditf(‘rence r= iilt! lOO-Oh ja-r cent 

11-31 ' 

and the percentagt!; (u-iticid diflerene.e. at. tin* i* " b-td level 
(3-250 X 1-588) 1 00 

== — — 45-(),‘t |)<.|* oent. 

An increase of 100-09 per cent winch is greater than tin* {/enu-ntnge crit ii*al 
difference thus shows a signilicant dilhuence. in milk yields by special ireatnu*nt. of 
the cows. 

This experiment is, in its statistical aspect, strictly eom|>arable with l lie i-lussical 
experiment of CushiK'y and Peel)I<*.B on tin* HOj>orific (dTeet of the optical iHonu-rs of 
hyoscyamine hydrobromidc, which is uhcmI as an exam|)le in Htialent 's |>aper j /to/aic 
tnlai, VI, page 19 ] and by Fislier in his la/ok ([>age 105). In both t he experiments 
two treatments w<‘re a.p{>li<><l to tin* sanu* group of pidients. 


(k)MI‘AlUSON OF INinnn-mOKNT 

Let UB suppose tluit tin*, above ligures ('faljh^ XVI 1) Inn! l/een obtaineii by sub 
jeeting two different groups of cows to the two treatnn*nts, one grtuip to t he normal 
and one to the special treatment ; tin* experiment would ha ve been les.s ‘well conf rolled 
because it is prohabh* that imlividual respotmes to tin* t.r(*atmentH would to a certain 
extent be correhited. Thus, in the first instance, wln-n only one grouji of animalM 
is used for the two tn*atments, w-e Hhonld ex|)ect, that a higli yiehiing indiviilual 
under one treatment would be a, high-yielding imlividual under tin* other treatment., 
whereas, using two groups of animals, individual variat ions in tin* yielding power of 
animals in each group, apart from the effects of t he treat ments, will have a jH)Wf*rful 
influence. 

If the data are derived from two groups whic;h are not st rietly comparalde, that 
is, if the two groups of observations are ind{‘pen(h*nt of <‘ach other, then the standard 
deviation is calculated by dividing the sums of 8tjnnr<‘s from the. two samples l>y t he 
total number of degrees of freedom contrilmted l>y them ; ‘ j 5 ’ is then (‘.ahmlatcd by 
dividing the difference of the two moans by the standard error and entering the 
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table of ‘ ^ ’ with degrees of freedom equal to the sum of the degrees of freedom 
from the two samples. Taking the above figures to represent two different groups of 
cows, we have— " 


Table XVIII 


Comparison of milh yields in SaJiiwal cows 


Treatment 

Size of sample 

Mean yield in 
lb. 

Sum of squares 

Variance 

±d^ 

(Std. error)® 




m 

N—1 

N (N— 1) 

1. Special 
ment. 

treat- 

10 

22-72 

165-216 

17-2462 

1-7246 

2. Normal 
ment. 

treat- 

1 

10 

11-31 

134-189 

14-9100 

1-4910 


Let mj and be the two mean values based on samples of size and 
let and be the two corresponding variances. 


tion- 



TTv&vv. S Sk-tTi-jUu 

're-iH^cx^ fuji^ CojL ^ 

■ 


X <56. 






orVf s hhs^isen calcMatefi by formula (247~we-can'apply foiTtmlai. t2&) to''deteri]Qiae 
‘ t \ 

Substituting the required values, we have, 


22-72 — 11-31 
\/l-7246 -h- 1*4910 


= 6-364. 


Entering the table with n — 18, the sum of the degrees of freedom of the two 
samples, we find that the observed value of ‘ ^ ’ lies beyond the 0-01 level of 
significance and hence the difference is statistically significant. 

In examples such as this when A''^ = Vg = N, we can substitute for the above 
formula for determining the value of ‘ i ’ the formula given below : — 


(mj — Wg) X \.''N 

v/^ +■ ^ 


B 2 


(29) 
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Iti.tlie present case, tlicrefore— 

(2‘2-72- n-:Vi) X \ 51-** V^’ r,-;uu. 

V ' (T7-24(W 1 14-li!) ^ aVir.i'K* 

The HtiKli’iJt shtiuld nnie lliiif with iii«!rj»i'!i»ii-jtt .tHiupS* n tii* valu I a ^ n i 
deviatkm of {.In* tlitrnrener (iSW) uhiainad troin <!»♦• iniitoi a^ < ■ \ (.s,) ! 

is 6*671 and is very difTereni froju the v.duejetU oh!ain<d h) do 
If we apply fornnila {2h} wt* have froin 'I'ahle X! \ , 

S, 2 155*21l> and • IH'MBU 


therefor(‘. ^ 


'(I 


y 


hence f - 


lf>r>*‘2I6d r24*lHU 
(fo ‘ fj f {Id 1) 

{22*72 • ll*:Vi) 


4*{)1 


y 




tXWI, 


1 


!tt 


Malmkiiolns has pnhliahed a table whirh in a iwHiitieatniu of Kwlo'ik ‘ # ' talde 
and which may be nH<‘d to determine die uf difTeieii, i-,* brt u. m mdepen- 

dwit Biitii Tliin tlip liiHtrilnif <4 u J \iiiiMiidt p \i 

of signifitsaneo 


/ 


wij m. 


Suv 


id(t) 


wlutrc ta, and »».. art* ttn* two mean value's and .S'ar. rt the average .'Oundaid tlevin- 

tion of the Haniph'H tieffrinineti by tin* turmula iS'tir. /{Bit i 

V' .> 

#4 

In the caso of milk yie-hls, we have 

^ /'fl-md MdMt) 

- ■ v 1<V«»7 hD 


a value, whiidi aei's’ea with that tthfametl by upfdyiny' btrmnla 
It; followB, t-h<*rerore, iliaf. 

i I *‘i I 

/• ,i‘ -- 2*B4rel. 

4*01 

From the iabki of/, wliieh is entered with h et|ind t«t si/e *(f tlm onmber «*f 

dilfexenceH in t.hw cane, we tind tlraf. wdseii n lo. the \ubte t»f / l**.t'd' at the 

P r= <>1)1 level, 'fin*, olwt'rved value of ' j' m, fherehtre, ntyiiilieatit at the Ihw level 
of one per cent and hence we eonelude that the ditTer^mce in milk yit Id < i i .'b‘*iu!ieanf . 

From the ‘/’ tnhle we <’im rend directly the niye id the jiampIeH re»|uired ibr 
diilerenci* (»l given Higruth’ance. TIiuk, in <»nr example *.vifh f vo* r«'ijuirc 
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at tlie i ]^er cent level a sample of only 3 or 4 cows. Fisher s * f ° table can, of 
course, also be used in this way. 

The relationship between and “ t ” is given by — 


/ 

and Sav. 
whereas, t 


Sav. 


j 


(Si)i+W. 


2 


mj — 


Wi— mn > 

—J 


« 


and / 


Baxf, 

■‘J 


n 

"2 


n 

¥ 


7h 


Substituting in our example, 

f = 6*364 ^ 


2 

10 


2*8454. 


(31) 


The determination of the number of samples necessary to measure diflerenoes 
with varying degrees of precision has been also investigated by Livermore and 
[1933], who have given tables showing the number of replications necessary for signifi- 
cance at the 30 : 1 level for various percentage dif ferences. The method is based 

on the assumption that if Ea = {E^,Y, then E^^ — v-^ since E^ 

and L’a are generally equal. 

The method of Mahalanobis (“/” table) appears to be more accurate since it 
is based on Fisher’s “ i ” criterion. 
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CHAPTER VII 

CORRELATION 

In our statistical study of variation, wq have, upto the present, considered only 
the measurement of variation in a single variable. It often happens, however, 
that changes in one variable are accompanied by changes in another and that a 
definite relation exists between the magnitudes of the changes in each variable. In 
other words, there is an association or correlation between the two variables. Thus 
in our example of variation in the length of ear-head and number of grains per 
head in wheat we find that heads of greater length possess a higher number of grains. 

When two variables change together in such a way that a fixed increase in one 
variable is accompanied by a fixed increase in the other, the variables are said to 
be positively correlated. A perfect example of correlation is the relationship between 
the changes in temperature and the length of an iron bar. Por every single degree 
rise in temperature, the length of the bar increases by a fixed amount and the tempe- 
rature and the length of the bar are positively correlated. In biological measure- 
ments, the relationship between two variables is not likely to be so clear cut and 
definite as this, but it is obvious that certain characters may be expected to show 
very strong correlation. Por instance, in general we should expect a strong positive 
correlation between the heights of human beings and their weights and this has, of 
course, been found to exist. Again, if stature is a heritable character in human 
beings, we should expect a positive correlation between the heights of parents and 
their children and this has also been demonstrated. Should an increase in one 
variable go hand in hand with a decrease in the other than these two variables are 
said to be negatively correlated. If there is no mutual relationship between two ^ 
variables then they are said to be independent or uncorrelated. 


The correlation table 

The correlation table expresses the relationship between the two variables. Por 
example, the number of grains and the length of heads in Pusa 12 wheat are entered 
up in a table (Table XIX) in such a way that one set of variable is arranged verti- 
cally and the other horizontally. In the class of 15 grains to the ear, for instance, 
there are 2 ears with an average length of 5*5 cm. ; 1 with 6 cm. ; 8 with 6-5 cm. ; 
2 with 7 cm. ; 3 with 7*6 cm. ; and 1 with 8 cm. This shows how in a total of 
17 ears with an average number of 15 grains to the ear the variation in length is 
distributed. In this way all the 400 ear-heads are grouped together according to 
their respective lengths and the number of grains per ear. The shape of the 
distribution indicates the extent and the nature of the correlation ; the more 
elHptical the distribution the stronger the correlation. If the long axis of the 
ellipse slopes from left to right the correlation is positive ; negative correlation is 
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indicated when, the long axis of the ellipse slopes from right to left, and if the 
distribution in the correlation table is not markedly elliptical the quantities are not 
correlated (Fig. 10). 





ABC 


Fig. 10 , — Dispersion of data in correlation table, (a) uncorrelated^ (:b) positive, and (o) negative 

Table XIX shows the distribution of the data in the correlation of length of 
head and the number of grains per head in Pusa 12 wheat, and from the nature of 
the ellipse enclosing the area covered with the data it is obvious that there is strong 
positive correlation. 


CoEPEICIENT OF COERELATION 


The intensity of correlation is measured by a coefficient, usually indicated by 
the symbol f, which is computed according to the formula : — 

S (f. 


^ ay 




(32) 




where is the coefficient of correlation of the variables x and y ; is the devia- 
tion of the X variables from the mean of x ; S,, is the corresponding deviation in the 
y variables from the mean of y ; and g'j. and Oy are the standard deviations of x 

and y respectively. The item ? 


N 


is called the mean product moment. 


The direct application of this formula is very laborious but a good short-cut method 
is described below, and can be used with reliance. 

In this short-cut method the lowest class value in each variable is taken as the 
‘ arbitrary origin ’ and higher class values are considered in serial order as devia- 
tions from this. Thus for the length of ear-heads the smallest class value is 5*5 
cm- and taking this as the arbitrary origin equal to 0, the class value of 6 cm. is a 
deviate of 1, the chiss value of 6-5 is a deviate of 2, the class value of 8*5 cm. is a 
deviate of 6, and the largest class value of 13-6 is a deviate of 16. A similar process 
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is carried out for tlie class values for tlie number of grains. In tbe correlation table, 
to each square containing a number expressing tbe frequency of tbat particular 
variate, we add a number wbicb is the product of tbe deviations from tbe two 
arbitrary origins. For example, in tbe class containing ear-beads of a class value 
of 10 cm. and number of grains of 30 tbe deviations from tbe arbitrary origins 
are 9 and 4 respectively and tbeir product, 36, is shown in brackets and 
is called tbe ‘ INDEX NUMBER ’. Tbe sum of tbe products of tbe frequencies 
and tbe index numbers are shown in tbe last column to tbe right of tbe table, 
(S f. d^.dy). Tbe total of this column divided by tbe size, N, of tbe sample is 
called tbe mean product moment from tbe arbitrary origin. A correction has to 
be appbed to this number because deviations have been taken from arbitrary 
origins and not from the true means and because tbe actual class intervals have not 
been used in tbe calculations ; this correction is tbe product of tbe two corrections 
which were used for tbe calculations of tbe means (pages 19 and 20) from tbe same 
arbitrary origins. Tbe corrected mean product moment divided by tbe two 
standard deviations gives tbe COEFFICIENT OF CORRELATION. In tbe 
present example tbe calculation is as follows : — 


Mean product moment from arbitrary origin = 


S f. 
N 


= l yy = 40*135 
400 


Product of tbe two correction factors = = 8*956 X 4*11 — 36*805. 

Therefore actual mean product moment, — ^——2: — 


{ 


^/* 

N 



= (40*135 — 36*805) (6x 0*5) = 8*325. 

{ix and iy being tbe two class intervals) 

Product of tbe two standard deviations, i.e., — 1*4408 (page 20) and Gy- 

6*9992 (page 19.) = G^-Oy — 1*4408 x 6*9992 = 10*084, 

therefore r^y = — — — ^ 


8*326 


N.Ga^.Gy 


0*8266. 


10*084 

Tbe probable error of tbe coefficient of correlation is expressed by tbe formula 

±0*6745 (l—r^) 


E, 


V' 


(33) 


n 


Tbe expression (1 — r'^) has been calculated (see Pearson’s Table VIII) for all 

0-6745 , , . . . 

values of r from r = 0*001 to 0*999 and tbe term — can be obtained from 

v n 

Pearson’s Table V as already explained on page 27. Tbe calculation of tbe 
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probable error of the ooefiieieat of correlation, tlier<€i»r«*, is » iJitttli'r of |rr«* 
simplicity. In. the present example iroiu Pearnon'H 'i'alilt'S we liinl : 

0-6746 

— zzT"- == 0-03372, 

V'400 

and 1— (0-8256)^* {)-3184 : 


therefore £1^ =s 0-01074- 


The standard error of the coefiicieiifc i)f e4>rr<‘lati<»ii in, 


( i f") 


Values of r always lie between 0 and :!;; 1 ami tin* Idlbnving iiit«*rprf liil ioii.i iir*- 
generally given to different values : 

-|- 0-5 to 4* I’O indicates higli positive <'orrelatio« 

— 0-6 to 1*0 indicates high negative <‘orrelation 

"b 0-3 to d- 0*4 indicates mo(h*rate, posit ive t’orrelatioji 

— 0-3 to 0*4 indicates motlerate rn-gativc eorrelntion 

smaller than Ih-l indicates low {)ositav<* or negat ive eorrehit ion. 

If only small miinhers of indivhluals are avaihdde for t he meHsuroioonf of eaeh 
character the data are not grouped in cIuHses or even iit a table. l‘he following 
example shows the calculation of the ctH-tlicient of eorrelution tiefwi'en the brejikage 
of rice grains in milling and the temperature of unhusked liei- j ltlnutl and C. Tin, 
1933], In this problem tin; r(‘Hp<‘ctiv<* values of .V or temprruttne. and > or bn-ak 
age percentage, are arranged in i)!i,ran<‘l e.olutnn.M. 'I'he uioanM of .V and i are 
calculated directly Irom the totnls (»f the eobimriH, and the tv pertue afundnrd 
deviations arc caleiilat(;d from the sums ofsipmre.s of eaeii .Nf|»aritte entry 


thus 


^ (.V“) 


where X indicates tlio mean of A'. 

The product moment is obtaiimd by sunmiating the prodta ts of A and ) ami 
dividing by w and snldracting from tins tin* pro<lnei of the two nn ani . a’. Y. thn , 


l^rodiict nionient 


and therelore. 


(V. y.) 


(A’l) 


(a:. v.) 


(A'-) 


[xf.y 




„ XX giv6B the A and ) iitui I.Ih"* iletdiiln of flii! of Y 

Y, X (X\ S (I’“) and i: {XY). 
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Table XX 


Correlation of the temperature of unhushed rice {Londi temperature) and the percentage 

breakage of rice grains in milling 



I 

Londi 

Temj)erature 

“ X 

Percentage 

breakage 

Y 

■ ' ' ' ' 

1 

ys 

XT 

1 

33-9 

27-3 

1149-21 

745-29 

925-47 

2 

34-6 

29-5 

1197-16 

870-26 

1020-70 

3 

34-5 

26-8 

1190-25 

718-24 

924-60 

4 

36-9 

29*5 

1361-61 

870-26 

1088-65 

6 

37-1 

30-5 

1376-41 

930-26 

1131-66 

6 

37-3 

29-7 

1391-29 

882-09 

1107-81 

7 

28-8 

25-6 

829*44 

655*36 

737-28 

8 

29-6 

25-4 

876-16 

645-16 

761-84 

9 

30-7 

24-6 

942-49 

605-16 

766-22 

10 

31-2 

23-6 

973-44 

556-96 

736-32 

11 

31-6 

26-1 

998-66 

681-21 

824-76 

12 

32-2 

24-9 

1036-84 

620-01 

801-78 

13 

33-4 

27-0 

1116-56 

729-00 

901-80 

14 

33-6 

25-6 

1128-96 

665-36 

860-16 

15 

33-6 

26-4 

1128-96 

696-96 

887-04 

16 

33-9 

27-2 

1149-21 

739-84 

922-08 


632-9 

429-7 

17846-65 

11601-39 

14376-96 


Mean of i 33-30626 

16 


429*7 

Mean of 7 = * = 26*85625 

16 

Substituting these values in the formula — 

— (33-30625 X 26-85625) 

16 


j/ —(33-30625)2 X J — (26-85625)2 


0-8482. 
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= 


± 0-6746 (1-r®) ± 0-6746 { l--(0-84836)*} 


Therefore r. 


xy 


■\/n 

= 0*8482 ± 0*0473. • 


/ 16 


± 0*0473 


This shows that there is strong positive correlation between the temperature of 
unhusked rice and the breakage of rice grains in milling. 


Kegeession 


The coefficient of correlation expresses the degree in which tw.o variables are 
interrelated, but it is useful to have some method of computing the average expected 
values of one variable corresponding to particular values of the other, in other words 
we may wish to calculate the regression of one quantity on the other. The greater 
the correlation between two sets of variables, the more accurately may the value 
of one variable be predicted from^ kn^^^.^ue^(_ofJ^e other. In our example 
of Pusa 12 wheat it is possible to cGm^t^n^verage number of grains for a parti- 
cular length of ear-head. This is done by solving an equation the details of the 
calculation of which are shown below. In the case of Pusa 12 wheat this equation 
ultimately yields an expression. 

X -^ ^ . 7 8 8 I 04 - 69 9 5 -y 7 X’ 


where Z is the length of ear-head and Y is the number of grains per ear-head. By 
substituting the extreme values of 7, we can obtain corresponding values of Z ; and 
similarly values of 7 can be calculated for the extreme values of Z and the results 
plotted in the form of two straight lines which will intersect at a point corresponding 
to the means of the two variables. 

The angular difference between the two lines of regression (Fig. 11) is inversely 
proportional to the strength of the correlation between the two quantities. If there 
is no correlation at all the two lines are perpendicular to one another and the mean 
value of one variable is the same for all values of the other. If the coefficient of 
correlation is unity there is perfect association and the two lines will coincide. 

The details of the calculation of the regression of length of ear (Z) on the number 
of grains per ear (7) in our example of Pusa 12 wheat are given below : — 

Regression of length of ear (Z) on number of grains per ear (7) in Pusa 12 
wheat (1930-31) is — 

Z = I Z-v^.7 ] -h {v-^.7 
C Gy ) i Gy 


\ 

( 

} 


1*4408 1-4408 

= 9*98 — 0*8256 X - X 30*65 -f 0*8256 X . X 7 

6*9992 6*9992 

= 4*7880 + 0*16995 7. 

Substituting the values of the end classes of 7, i.e., 10 and 55 grains respectively, 
(Fig. 11) we get — 

Z = 4*7880 + 0*16995 x 10 = 6*4875 
Z = 4*7880 -b 0*16996 X 55=14*13625 
which are the end points of the Z axis. 
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Regression of number of grains per ear (7) on tbe length of ear (Z) is- 


{ 


'asy 




30-56 


cr X 

0-8256 X 6-9992 




1-4408 
9-4763 4- 4-0106 X. 


X9-98+~ ^ ^ 
1-4408 


Substituting the values of end classes or 6-6 and 13-5 cm. respectively we get— 

7 = _ 9-4763 + 4-0106 X 5-6 = 12-5820 
7 = — 9-4763 -f 4-0106 x 13-5 = 44-6668 

which are the end points of the 7 axis in Rig. 11, 

Thus the average value of Z for any value of 7 and 'oice versa can be calculated. 

Having obtained the desired straight-line regressions of Z on 7 and 7 on Z as 
shown in Rig. 11, it may be desired also to obtain the average deviations per class 
of the various x and y arrays as shown by the broken lines plotted in the Rig. 11. 
The easiest way of doing this is to tabulate the correlation surface as shown in Table 
XXI and to calculate the mean of each horizontal and each vertical frequency 
distribution as shown under 'XiyXjfy. It will be observed that instead of comput- 
ing the SyZ and XxY values from the actual class centres for the number of grains 
and length of ear-heads respectively, these have been obtained from arbitrary 

class centres of 1, 2, 3, 4, 5 etc., in order to save arithmetical labour. The 

values thus obtained are plotted and show the deviation of each class from the 
straight-line regressions. 

The regression coefficients are given by the formulae : 


c y ^ or a? 
r 2- and r 

G X (j y 

and the equations giving the most probable value of y for any given value of a; and 
vice versa are obtained by 


(u) ... (y 
(b) ... (a; 


— y) = r 

G X 


— x)=r 


GX 

Gy 


{^~x) 

(y — y\ 


Rrom equations (a) and (6) it is evident that the expressions, r and r ^ ^ gi 
the tangents of the angles that the lines {a) and {h) make with the Z axis. ^ ^ 


Now r 

GX 


represents the rate of increase ©f- y for unit increase of x. In our 


Pusa 12 wheat example. ^ ^ =4-011. This indicates that for every change of one 
ixnit of X, which is the independent variable, there is a corresponding change of 4*011 



Correlation surface and average deviation per class in Pusa 12 ivheat {1930-31) 
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in Y, the dependent variable. For eac.li inc.reaH<^ of 0-r, nu. in t he bnigtli of <‘ar lorndH 
in Pusa 12 wheat the minibor of graiim in(n'«ia8ns on an a\Mn*agi* by ‘i’O 0 r »,5 

Similarly r " 0*10995 givoB the raf-e of iiuToaMo S .i- for nnit. im ri'aHt' of ?/. 

ts y 

In the example in quoBtion the manlxir of graiuH per .-ar-hejuL which jh tin* indeprn 
dent variable, increascH by 5 and f-h(i dopendeni. variid»lo iniTcaHcH af. an averag*^ 

rate of 0*84976 cm. 


SiGNIFIOANCK OF AN OltHKRV tCI> t'OBIlKI.ATiON 


The probable error of the (‘XKrffieh'nt of ci»rr<dation in given l)y tho forrnnla : 

± 0*6746 (1- rS) 

v/ n 

This tells us that it is an oven <dianc.c that, any HultH«H|in‘nl. dot, crniinat ion of tin* 
coefficient, based on a similar Hamjdo, will fall within ihmn limit h. I'im formula 
given for the probable error of tin*, r.ocilicii'nt of <*om*lation in mn urahi iuily when 
nis not very small, since with large Hamjden and moderate juid Hfuall valuen of r 
the correlation is distributed normally about, the t rue value of the eoellieieivt <d 
correlation of the universe. With small samphis, less than a Imndred, the value 
of r is often very different from the true value and the factor I /“ is conM«o{W«ittly 
in error. Theroforo, tests of Hignifi(;ance l>aH«'tl on t in* |iro!Mihle error m' sfamlard 
error for small samplcH are not always v<*.ry reliahh*. 


In testing the aignilicauee of an ohsru'vcil corrchit ion it is necessarv to determim* 
the probability that sucb n value \vouI<! <n'imr in a random sample drawn fr**m a 
population in which the two variables are not eorrelated. If this prttbalulity is 
small, he., at the 5 per cent or at. the 1 piT eent level of siemtican<‘c, the corrrhi," 
tion coefficient may he considered t.o he Hignilicant. Signiticance depends upon 
the size of r and upon the numhe.r of <(hHervationH in the sample from whieh r is 
calculated, and can readily he de.termined hy the applieatiim of Finlter’s * t ' fable 
or by a special table (Fisher’s 'ralrle V-A) whieh Im has elaborated. It rd lie flo* 
number of pairs of obKcrvations on which the correhttion is lia.Hed. aiut r the 
correlation obtained, then, 


. . . (d?) 

v/ 1 "-f 

and it may be deinonstrated t-hat t;h<‘. distrilmt ton <*f ‘ f ' . .. eahulnted, w ill ao.nee 
with that given in the table.. 'Plie * f ' table is •mtered wd ti degiee . .,t lieedom e.jn.d 
to the number of pairs of oltservut ion.s h^sn 2, /.c.. t> 2, 

Fisher’s Table V-A aliows this t.eai. to he applied direct iy from the \;tlue ..f /■, fur 
samples upto 100 pairs of observations. Taking the fniir defimli* Ie\el;< of .uj-ni 
flcanee, represented by P =- 0*10, 0*05, 0*02 imtl tt*ol, t h e ialde .hIiow:. for eaeh value 
of n, from 1 to 20, and then by hrrger intervals to bfO, Um trrtnesponding sigmtieanf. 
values of r. 



I’LANT BEEEDING AND AGRICULTUEAi PROBLEMS. 


75 


Example 16 . — In 16 pairs of observations tlie coefficient of correlation between 
the breakage of rice grains in milling and the temperature of unhnsked rice was 
found to be 0*848 (page 69). What is the probability tliat such a value would 
be obtained by random sampling from data in wliich tire variables are uncorrelated *? 

Applying the formula 


we have, r — 0*848 

and from Pearson's Table VIII, we get 1 — = 0*2807, 


0*848 , 

therefore, t — ^ ^ 16— -2 = 5*9866. 

v/ 0*2807 


The ‘ t ’ table must now be entered with degrees of freedom equal to n ' — 2 = 14 
and we find that the expected value of ‘ ^ ’ at the P = 0*01 level is 2*977 ; the 
observed value, 5*9866 being much greater than the expected value, we conclude that 
a coefficient of this magnitude would not occur once in a hundred trials as the result 
of chance errors and therefore, it is significant. 

Fisher's Table V-A shows that for 16 pairs of observations at the 1 per cent 
level of significance a value of r of 0*6226 or more is significant, the table being entered 
with «= n' — 2, in this case n — 16 — 2=14. Our observed value of 0*848 being 
much higher we can safely conclude that the correlation is statistically significant. 

The distribution of different values of r at the 5 per cent and the 1 per cent 
levels of significance for degrees of freedom of 1 to 30 and thence by larger intervals 
to 100 in simple correlations of two variables is shown in Fig. 12. From these curves 
it can be seen that with 14 degrees of freedom the significant value of r at the 1 per 
cent level is 0*623. 

Fisher makes use also of another method of determining the significance of r, 
which is based on the following transformation : — 

z ~ i log.^# X it-- .... (38) 


and values of 2 corresponding to values of r are tabulated in his Table V-B. The, 
standard error of 2 is simply 

1 

<7Z = ZZZZZZT' .... (39) 

y/ n ' — 3 

and is iruh'pondent of the value .of correlation in the sample under consideration. 

In th(i case of our example, applying Table V-B, we see that 


and tihc st.andard fu-i-or of 2 


r = 0*8483 
the value of 2 = 1*25 

1 

16—3 
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2 ! is a difference between two logarithmic values and to estimate the significance 
of this difference we must express it in terms of its standard error and refer to the 
tables of the probability integral. Thus, ' 

z 1*25 





— 1-25 X v/ = 4:*606. 


v/ 13 


Kef erring to Pearson’s Table II, we find that when this ratio is 4*506 the greater 
area of the curve cut off by the ordinate is 0-9999966 and the area in the two tails 
is, therefore, 2x 0*0000034. Therefore the observed value of the coefficient of 
correlation will be exceeded by chance only 68 times in ten million trials. 


Significance of difference between two observed correlations 

Two tests with Taungdeikpan rice in Burma [Rhind and U. Tin, 1933] showed 
that the coefficients of correlation between the breakage of rice grains in ■milling 
and the temperature of unhusked rice were, 0*8912 in a sample of 12 pairs and 0*8482 
in a sample of 16 pairs of observations. Are these values significantly different % 
This can be determined as follows by transforming the values of r into z : — 

Table XXII 


Correlation coefficients of brealsage of rice grains and the temperature of imhushed rice 


Samples 



T 

z 

n —3 

Reciprocal 

1st sample . 

- 

• 

0-8912 

1-4276 

9 

O-Hlll 

2nd sample 

• 

- 

0-84:82 

1-2496 

13 

0-07692 

Sum 

Difference , 

• 

• 

0-0430 

0-1780 

• • 

0-18803 


We find that the difference in the values of z is 0*1780 and that the standard 
error of the difference is, of course, the square root of the sum of the reciprocals 
of 9 and 13. 


Standard error of difference oiz = v''^0*lllll + 0*07692 = 0*4336. 

The difference, between the two values of 2 , mz., 0*1780, being smaller than 
the standard error of the difference is not statistically significant. 


Long-time trends 

If it is desired to determine the general inclination of movement in' a series, 
such as average yields per acre or trend of wholesale prices of crops, e.g., rice, wheat, 
linseed, etc., over a long period of years we may simply represent graphically the 
upward or downward tendencies as shown in the continuous line cliart given in 
Fig. 13 or if a precise measurement is to be made of the degree or extent of the rise 
and fall in the course of a series, we may employ special methods, such as have been 
described by Harper [1930] and other workers, 

F 2 
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Por series wliioli sTiow a general movement in one direction, tlie straiglit line 
trend of least squares, or the first degree parabola, is usually applied. This straight 
line is so fitted that the summation of the squares of the deviations from it is less 
than the summation of the squares of deviations from any other straight line that 
can be drawn. A concrete example of this is given below and will illustrate the 
technique involved. If, however, a series shows a marked upward slope and then 
moves downwards at a decided slope, it camiot properly be represented by a straight 
line trend but will require the application of the logarithmic, or the second degree 
parabola. The discussion of the latter method, however, is out of the scope of 
this book. 

Example 17. 

Soil fertility experiment in the Pusa Farm. — 413 acres were cropped under a three" 
year rotation designed to supply grain of cereals, such as oats, maize, etc., and 
some pulses, as well as the supply of fodder for the upkeep of the pedigree dairy herd. 
Yields for the 15-year period, 1912-13 to 1926-27, are given in the second column 
of Table XXIII. The production of grain shows a steady upward tendency during 
this period. Yit a straight line of least squares to prove that there is a steady 
upward trend of fertility. 


Table XXIII 


Straight line trend of the total yield of grain {cereals and pulses) of 13 fields (413 acres) 

in Pusa Farm * 


Year 

Total 
yields 
of grain, 
in maunds 

Deviation 
from the 
point of 
origin 

Deviation 

squared 

Product of 
deviation 
and yield 

Ordinates of 
the straight 
line trend of 
least squares 


y 

X 


xy 

r 

1912-13 


1 

1 

3,626 

3386-2169 

1913-14 

3,297 

2 

4 

6,694 

3683-6383 

1914-16 

2,987 

3 

9 

8,961 

3780-8697 

1916-16 

4,264 

4 

16 


3978-1811 

1916-17 

4,499 

5 

25 

22,495 

4176-6026 

1917-18 

4,662 

6 

36 

27,972 

4372-8239 

1918-19 

4,982 

7 

49 

34,874 

4570-1463 

1919-20 

4,262 

8 

64 

34,096 

4767-4667 

1920-21 

4,381 

9 

81 

39,429 

4964-7881 

1921-22 


10 


61,530 

6162-1095 

1922-23 

5,189 

11 

121 

57,079 

5359-4309 

1923-24 

4,636 

12 

144 

64,432 

6666-7623 

1924-25 

7,517 

13 

169 

97,721 

6764-0737 

1925-26 

6,984 

14 

196 

83,776 

6951-3951 

1926-27 

5,183 

16 

225 

77,746 

6148-7166 

Total 

71,612 

120 

1,240 

627,346 



* Data from Scientijia Mepis. of the Agr. Bes. Inst,, Pusa, 1926-27, p. 73, 
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The straight line trend may be calculated by solving the two normal equations, 
(44>) and (41). 

Y — a -{- h X (40) 

where Y is the ordinate of the straight line, a the starting point, h the slope of the 
■line and x the distance from the point of origin. The following normal equations 
may be constructed for the formula Y = a -\- h x 

Yiy — + (41) 

Yhx y — aYjX-\-hYjiX^ (42) 

Substituting the proper values from Table XXIII and solving the two normal 
equations, we have : — 

(1) ... 71512 = 15 a + 120 6 

(2) ... 627346 = 120 a + 1240 b 

Ehminating a by multiplying equation (1) by 8 and equation (2) by 1, we get, 

(3) ... 672096 = 120 + 960 b 

(4) ... 627346 = 120 a + 1240 b 

(5) ... — 55250 == — 280 b [ By subtraction of equation (4) from 

equation (3).] 

therefore (6) ... 197-3214 = b. 

In equation (1), we have seen that 
15 a + 120 b = 71512. 

Therefore, by substituting the value of b, we get, 

15 a H- (120 X 197-3214) = 71512 
/. Wa = 71612 — (120 x 197-3214) 
and a = 3188‘8955. 

Since b — 197-3214, wc conclude! that tlic! slope h of the straight line trend 

of least squares is a positive ([uaiitity, and Ikuicc tliat tlie yield of grain from the 
whole area of 413 acres has shown an upwai.-d t(>.ndeney at an average rate of 197-3214 
maunds per year since 1912-13. Now to de-tm-inine tiie ordinates of the straight 
line of least squares we simply have to substitute actual values in the formula Y 
= a + 6 X. Tor instance the ordinate for 1912-13 is : — 

Y = a b X 

= 3188-8955 d- (197-3214 X 1) 

= 3386-2169 

Similarly the ordinate for 1913-14 is 

Y = a d-6a: 

= 3188-8955 + (197-3214 X 2) 

= 3583-6383. 



Yield of Grain 
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Similarly tlie ordinates for tie years following are obtained in tie same way, tie 
value of X increasing by 1 for eaci year above 1912-13, or simply by adding tie 
value of 6 = 197-321^to tie value of tie ordinate of tie previous year. Tiese 
are siown in tie last column Y of Table XXIIL 

Tie actual data in tie second colunm of tiis table may now be represented 
grapiicaUy as siown by tie broken line in Fig. 13 ; tie straigit line trend of least 
' squares, plotted on tie ordinates siown in column 6, is represented by tie tiick 
line. 


MAUNOS 



Year 


Fi(/ 13 . — Straight Urn trend of the total yield of grain of 13 fidcls {413 acres) in Pum Farm, 1912-13 to 

1926-27 

Tie trend, therefore, definitely shows a steady upward tendency. 
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PLANT BREEDING AND AGRICULTURAL PROBLEMS. 


CHAPTER VIII 
FIELD TRIALS 

Field experiments generally liave for their object the testing of the relative 
yielding powers of different varieties or the comparison of the results obtained by 
application of different manures or varying cultural methods. In any case, 
whether the comparison is between varieties, manures, or cultural methods, the 
variables under studj’^ are referred to as treatments. Thus, in a comparison of 
yielding power of four varieties of linseed, the experiment contains four different 
treatments. The testing of varieties in the field presents special difficulties, for 
in no other type of experiment are there so many factors which are outside the 
control of the operator. Comparisons of the yields of different varieties grown in 
the same field in different years are affected by variations in climate. Since 
particular varieties will respond differently in seasons of heavy and light rainfall and 
a variety which proves the heaviest yielder in one season may prove inferior under 
different conditions in the next season. Similarly the effects of different manures 
may vary with climatic factors. When comparisons are made between yields of 
different varieties in the same years and in the same field, the varying fertility of 
different parts of the field introduces an unknown factor into the experiment and 
it may be said at once that the development of modern methods of conducting 
field trials has for its object the elimination, as far as possible, of the influence of 
this unknown variable upon the result of the experiment. The operator can control 
the manurial treatment, the method of cultivation and the particular varieties sown 
and in any one year in any particular field all treatments will be subject to the 
same climatic conditions, but there remains, however, a large and unestimated 
variation due to differences in the fertility of different parts of the field. This soil 
heterogeneity may, as is shown in a subsequent chapter, be determined for a 
particular field and its effects allowed for in subsequent experiments in the same 
field. Generally, however, the determination of soil heterogeneity in a field involves 
a lengthy series of experiments and considerations of time and space render it 
impossible to precede every field trial by determinations of soil heterogeneity. 

Since the fertility in a field generally varies in an irregular manner, the distri- 
bution of patches of high and low fertility follows no definite plan ; in other words, 
we are generally faced by a random distribution of soil fertility. A common type 
of soil heterogeneity is one in which there is a distinct drift or gradient from high 
to low fertility across the field from one side to the other ; a fertility gradient may, 
of course, be combined with irregular variations in fertility. In order to eliminate 
the effects of the random variation in soil fertility or of the fertility gradient or of 
both, the experimenter seeks to distribute his treatments about the field in such 
a way that eacli treatment sliall have an equal chance of experiencing all varying 
grades of soil fertility. In a broad sense such distribution involves the scattering 
of treatments about the field at random and the replication of each treatment many 
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times. TKe experiment must 1be designed in sucli a way as to yield both an efficient 
comparison of the treatments and an estimation of the statistical significance of 
the observed differences between treatments. Both these results can be achieved 
by the replication of treatments in a number of small plots. The design and tech- 
nique of experiments will vary with the nature of the trial to be carried out but 
the fundamental principles outlined below will apply to all experiments. It cannot 
be too strongly emphasized that the success of an experiment depends upon a correct 
design and care in the field operations. No amount of statistical juggling with the 
data will compensate for inaccuracy in the field. 


Experimental technique 

(1) Selection of soil. The field selected for comparative trials should be truly 
representative of the soil and other conditions under which the crop to be experi- 
mented with is normally grown and the land should have been previously cropped 
in such a manner as to have been kept in an uniform state of productivity. In a 
plant-breeding station in which fields are generally being used for the study of a 
large number of cultures of different varieties it is a good practice to rotate a bulk 
crop with plant-breeding plots in order to keep the land in an uniform condition. 
The land selected for comparative trials should possess good drainage and should 
be of average fertility, neither very high nor very low. It should have no large 
trees in the neighbourhood as they are likely to disturb the growth of the crop by 
their lengthy root-systems. The previous cropping history of the land should be 
known ; it is obviously unsuitable to take land for varietal trial which has been 
used in the previous season for a comparison of different manures or vice versa. 

(2) Size and shape of plots. The size and shape of the plot will depend obviously 
on the nature of the crop to be grown ; it is evident that a size and slun)e which 
is suitable for a crop such as wheat will not give a reliable result with a crop such 
as sugarcane. It may be laid down as a general rule that small areas of land arc 
more uniform in fertility than large areas and hence trials with cereal crops (c.//., 
wheat, barley, etc.) should be carried out with individual plots of not more than 
one-fortieth of an acre, while for sugarcane, plots of about one-twentieth of an acre 
are of a suitable size. 

Increasing plot size results, in the case of small cereals, in a rapid decrease in 

th^variability of the yield, up to a size of one-fortieth of an acre ; witli plot.s laruer 
than this size the continued decrease in the standard deviation/is^ very small as is ^ 
shown in Eig. 14. If it is desired to increase the area under an experiment it is 
much more effective to do so by increasing the number of replications which may 
be easily done with plots of small area. The smaller the area of tlie unit tlie larger 
the number of replications it is possible to include in the expei'iinent and for this 
reason, with cereal crops it is possible to obtain accuracy with areas of plots much 
smaller than one-fortieth of an acre. 

Idiots are always rectangular in shape but may range from squares to long strips* 
The square shape has the advantage that the edge effect is reduced to a minimum? 
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but for certain crops {e.g., those in wMch the individual plant is large, as pigeon- 
pea and sugarcane) the long plot has practical advantages, such as ease in cultiva- 
tion, etc. 



Fig. 14:. lieduction in standard deviation due to increase in the size of plots {After Hayes and 
Garher, p. 74). 

(3) The arrangement of treatments . The selection of the best arrangement is 
greatly facilitated if there is any information as to the distribution of fertility in 
the field. This can be obtained by a previous testing of the field as a whole by 
growing a bulk crop and harvesting it in small rectangular plots. If the yields of 
these are plotted on a graph paper in the position in which they appear on the field 
a fairly accurate idea of the relative merits of different parts of the field can be ob- 
tained, and any tendency for the fertility to grade in one or more directions is at 
once obvious. It is, of course, not always possible to carry out such a preliminary 
test and the modern systems of distributing treatments in a field are designed to 
give effective results even when such information is not available. 

(a) Paired 'plots. When only two varieties or two manurial or other treatments 
are to be compared tlioy may be laid out in long strips across the field in the order 

A A B BA A B BA A. 


and Fisher’s “ t ” or Btudent’s “ a: ” methods of determining the significance of 
the results utilized. Tlic paii-ing of contiguous plots is an essential condition in the 
use of Stud(mt’s ‘ z ’ test in field trials. A lay-out, such as this, is a useful method 
for comparing one treatment witli anotliei', siiuie it gives paired contiguous plots 
which, by reason of theii- mucli greatei’ lengtli. than breadth, should each of them 
experience the varying grades of fertility in the field to a corresponding degree, 
or at least tin', experience of contiguous plots in this respect should be more or less 


similar. TJui object of arranging the j^aired plots in the series AB BA AB BA 

and not in tlie series AB AB AB AB is, of course, to insure that in any pair 


of plots one ti-eatment is not always on the same side of the other and thereby 
experiencing tlie advantage of a possible fertility gradient. Beaven’s half-drill 
method is based on the same principle and the statistical handling of such a lay-out 


is relatively simple. 

(6) The chess-hoard method. This method is convenient when several treat- 
ments are to be compared and the quantity of seed available is limited as may well 
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be tbe case at a relative early stage in tlie evolution of new types in plant-breeding 
investigations ; it is most suitable for use with, small cereals such as wheat and 
barley. The arrangement of plots in a trial with 6 treatments is 

ABC DBF A 
BC DBF AB 
GDEFABG 
DBFABCD 
BFABCDE 
F ABGDEF 
ABC DEF A 
and so on. 


The size of plots in the case of small cereals is generally 4 ft. X 4 ft. and by cutting 
away a 6-in. border the yield is actually taken from a square yard. In this type of 
experiment the same number of seeds must be sown in each plot and hence the field 
work is laborious. The advantage of this method is that with, such small plots a 
large number of replications can be distributed in a field and therefore each 
treatment should have an equal chance of experiencing the natural advantages or 
disadvantages of the variations in the fertility of different parts of the field. 

(c) The Latin Square. — This arrangement resembles the chess-board but there 
are two restrictions and the arrangement of types is not systematic. The restric- 
tions are that there are as many replicates of each treatment as there arc treatments, 
the plots being arranged in a rectangle with as many rows as columns so that each 
treatment occurs once in each row and once in each column. This provides for a 
double elimination of the influence of soil heterogeneity in two directions at right 
angles to one another, i.e., between columns and rows. In the following two arrange- 
ments an incorrect and a correct method of distributing treatments in a 6 X 5 
Latin Square is illustrated : — 


Incorrect method 
of replication. 

ABC DE 
B AC DE 
GBAED 
DEB AC 
EC DBA 


Correct method 
of replication. 

ABC D E 
D G B E A 
E ADC B 
BDE AG 
G E A B D 


Although in the incorrect method there is randomizatioji of treatments in th<‘, 
row, but there is none in the column, since in three of the (xrlumns the same 
treatment occurs in adjacent plots. In the correct method of randomization it 
will be found, however, that the same type occurs once only in the row and in the 
column, thereby greatly reducing the influence of soil heterogeneity. The Latin 
square is only a square in a conventional sense, for the actual shape of the plots 
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may be made to suit tbe available area and tlie crop. Plots may be square or rec- 
tangular but precision is lost if they are too long and narrow. Tlie statistical 
analysis of this lay-out as well as that of randomized blocks dealt with below is 
generally carried out by Pisher’s Analysis of variance. 

{d) Rando7nized blocks. When several treatments are to be compared the ex- 
periment may be arranged in the form of randomized blocks by dividing the experi- 
mental area into regular blocks of equal areas in each of which the treatments are 
distributed at random, each treatment occurring once only in each block. The 
individual blocks can be distributed in any manner, that is, either parallel to each 
other across the field or in two directions. It is an advantage if the area can be 
divided in two directions at right angles to one another so that all the blocks are 
not lying side by side. The advantage of the randomized block arrangement is 
that replication is secured, and at the same time it is easy to distinguish the soil 
variation within the blocks from that between the different blocks, since the total 
yield of any block is comparable with that of any other block in that each contains 
one plot of each treatment. The variations in the total yields of blocks may, there- 
fore, be ascribed to soil differences and these gross differences can be eliminated from 
the comparisons of the different ti'catments. The fertility of the field may be 
broadly classified into — 

(1) A major fertility variation usually marked by a fertility gradient ; and 

(2) Sporadic fertility variations, not systematic but distributed in patches. 

The object of dividing the field into blocks is to calculate the variance in yield due 
to the effects of tlie first type of fertility and eliminate this variance for arriving 
at an estimate of variance due to chance error. The object of randomizing the 
treatments within the blocks is to eliminate the effects of the sporadic variations 
in fertility. 

The randomized block method allows of any number of replications, unlimited 
by the number of treatments involved. It lias also the advantage that if a part 
of the experiment is damaged by some agricultural disaster {e.g., insects, floods, 
water-logging, < tc.) it is possible to discard entirely one or two blocks without 
destroying the entire experiment. A reduction in the number of replications 
would, of course, lead to a larger standard error in the experiment but would at 
least furnish a result of some value. 

(4) RejpUcations. The errors in a field experiment may, broadly speaking, be 
divided into three classes : — 

1. Errors due to soil heterogeneity ; 

2. Errors due to faulty technique ; and 

3. Errors due to chance. 

The errors due to soil heterogeneity we seek to eliminate by the random arrange- 
ment of plots on the methods previously outlined and by the replication of treat- 
ijaents many times over. The errors due to faulty technique vary inversely with 
the skill, care and experience of the experimenter ; wdth the trained worker they 
shoixld be non-existent or negligible. The errors due to chance come in unsuspected 
by the experimenter and are governed l)y the mathematical laws of probability, 
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the principles of which have been described in previous chapters. Chance errors 
can, therefore, be estimated by submitting the data to an appropriate mathematical 
analysis and in general, as has been shown, the importance of chance errors in an 
experiment varies inversely as the number of replications, since the formula for 
the estimation of error contains the square root of the number of observations as 
the divisor — 

Thus, to double the of a result obtained from 26 plots, we should have 

to take into the experiment 100 plots, since ^^26 ==5 and -v/ 100 = 10. The larger 
the number of replications, however, the greater the area of land and consequently 
the greater is the variability of productivity. Fig. 9, which shows the relationship 
between the values of ‘ i ’ and the size of the sample brings out clearly how the 
reliability of the result depends upon the number of observations. Not less than 
five repHcations are essential for rehability. The size of sample necessary for a 
given degree of precision can be determined by Mahalanobis’ Table of “/”• 

(5) Randomization. Systematic arrangement of plots in any definite order 
may increase or decrease our estimates of the experimental error. This is more 
pronounced if the lay-out happens to be such that the long axes of the plots lie 
parallel to a fertility gradient running from one side of the field to the other. Hence, 
a representative distribution of the strains in an experimental field brought about 
by randomization is bound to give a much more reliable result than mere systematic 
repetition. In other words, the randomization of plots gives an even chance for 
the random distribution of the treatments to spread themselves over the different 
fertihty patches. Such a random distribution gives statistically a valid basis for 
estimating the standard error due to chance on which comparisons between treat- 
ments depend. 

An easy method of randomization has been suggested by Fisher and is best 
explained by means of an example. Suppose that we have to randomize a set of 
five treatments in a randomized block experiment with six replications. To do 
this a set of random two-figure numbers is selected, the chance opening of a book 
furnishes a ready means of selection. Suppose the first random number from the 
book is 28 ; dividing this by 5, the number of treatments, we get a remainder of 8, 
and we allot treatment number three to the first plot in the fii’st block. Suppose 
now that the second number chosen at random is 36, this on division by 5 gives a 
remainder of 1 and treatment numl)er 1 will go into the second plot of the first 
block. We proceed in this way for all the plots in every block subject to the res- 
triction that the same treatment may not occur more than once in any block. If 
the number chosen is a multiple of 5, it is considered to indicate treatment number 
5 . 

Tippett [1927] has published 10,1^0 numbers of random digits and Mahalanobis 
[1933] has recently described the use of random sampling numbers in agricultural 
experiments and has published a fist of 2,000 random numbers. The student is 
referred to his paper for the method of using these numbers. 
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(6) Number of treatments. It frequently happens tha,t at the conclusion of an 
investigation on plant-breeding, the research worker possesses a large number of 
types of which he requires to investigate the productivity. It is ol}vious that the 
ordinary yield trial cannot be carried out with 50 to 100 varieties. Some rough 
preliminary selection must ])e made of the types which are of good yielding capacity. 
The usual practice in the Botanical Section at Pusa is to take the seed weight from 
60 plaints selected at random from a small plot of a variety. This is a method 
suitable for large plants such as pigeon-peas. To do this, it is necessary to grow 
small plots of all the types in the same field, the field being of average fertility. 
Another method is the rod-row test in which three rod-rows of each type are grown. 
The lateral rows are discarded for border effect and the yields of the central rows 
only are compared. This method is suitable for small plants, e.g., cereals hke 
wheat and barley.' The number of rephcations which can be given to a rod-row 
trial depends somewhat on the number of varieties involved. 

From the results of such a preliminary trial it should be possible to select a 
number of the most productive types and these may be tested further in randomized 
blocks with unit plots of about ^^th acre, with, if possible, 5 replications. The 
number of replications which can be given will naturally depend upon the number 
of types which have been selected for testing, but art this stage this number should 
not exceed 20. A further selection can now be made of about half a dozen of the 
best yielding types and a standard yield trial in randomized blocks or latin squares, 
with unit plots of the necessary area, as previously described, and with replications 
of certainly not less than 6. 

(7) Duration of yield trial. It is a fact well known to agriculturists that 
the yield of a crop from the same field varies in different seasons according as the 
climate is favourable or not to the particular crop. It will be readily understood 
that different varieties of a crop will, in addition to their morphological differences, 
possess physiological differences {e.g., water requirements), which will give an advan- 
tage to certain types in certain seasons. It is obviously impossible to carry on 
varietal trials for such a number of years as to ensure that the varieties experience 
a random sample of the climatic variations. In India, the most potent variable 
in the climate is the size and distribution of the monsoon rainfall and successive 
years may vary very greatly in the availability of soil moisture. Varieties which 
yield well in years of adequate moisture may fail to show high yields in seasons 
in which soil moisture is deficient and vice versa. It is, therefore, desirable to carry 
out standard yield trials for at least three successive seasons in the same locality 
in order that a deduction may not be based upon the results of a single season which 
has possibly been unduly favourable to a particular type. It is an advantage 
also to repeat a yield trial in the same season in different localities under different 
conditions of soil and climate. In this way by yield trials under different climatic 
and soil conditions, it is possible to form an estimate of the suitability of particular 
varieties for particular agricultural areas. 

(8) Sowing and rate of seeding. Sowing should be done in rows at a spacing 
appropriate to the particular crop and must be carefully supervised ; a definite 
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mim'ber or quantity of seed lias to be sown in each sub-plot. Broadcasting should 
never be done. If from some outside cause, e.g., white ants, plots become damaged, 
it is permissible to fill in gaps by re-sowing at early stages. Since the size of seed 
in different varieties may vary considerably, the seed-rate used in sowing a variety 
trial should not be based upon weight alone but upon the number of seed per unit 
weight and the percentage germination. An estimation of the number of seeds 
per gramme or any other unit of weight can easily be obtained and a sowing rate 
which will give an approximately equal number of seeds per plot calculated. This 
sowing rate can be increased or decreased according to the percentage germination 
of individual types. Thus, in three varieties of linseed taken for a yield trial, the 
number of seeds per gramme and the seed-rate per acre calculated to give approxi- 
mately the same number of seeds per acre are as shown below : — 


Variety 


No. of seeds 
per grm 

Seed-rate per 
acre in ounces 

Type 12 . 


222 

266 

Hybrid 69 

• 

187 

305 

Hybrid 64 


166 

366 


The germination percentage of all three varieties being between 93 and 95 per 
cent, the calculated seed-rate did not require further adjustment. Wherever possible 
the same number of plants should be used in each plot in an experiment. This 
may be achieved either by sowing equal number of seeds or by thinning the seedlings. 

(9) Harvesting. The harvesting of yield trials demands skilled labour and strict 
supervision. The student should remember that it is easy to lay down, at sowing 
time, yield trials on. a scale which it is impossible for the available staff to handle 
with accuracy at harvest. Subject to this limitation there are certain general 
precautions which must be observed, 

(10) Border effect. A strip on each edge of a plot must be cut away and 
thrown out of the experiment in order to ensure that the yield of each plot is 
only taken from such an area as is unaffected by the growth of contiguous varieties 
or by the presence of adjacent grass, drains or unsown land. The central area 
from which the yield is finally taken is referred to as the ultimate, plot. 

Each plot should be threshed in situ, if possible, by hand. Hence the desir- 
ability of not having plots so large as to render this impossible. In the Botanical 
Section at Pusa the usual practice is to spread a large cotton (drill) sheet on the 
plot after cutting the crop and to rub by hand or beat out the grain with small wooden 
sticks on this sheet. 

Seed should be thoroughly sun-dried until a uniform weight is reached. In 
Pusa this is usually done on a threshing floor which is enclosed and covered by a 
wire-net house. (Pig- 15.) 
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Since the total weights are of a smaller order than is generally the case in farm 
yields a more accurate and sensitive type of weighing machine than is generally 
used in agricultural practice is required. 
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CHAPTER IX 

STATISTICAL INTERPRETATION OF FIELD EXPERIMENTS 

Since tlie number of plots wliioh can be handled in a field experiment is relatively 
small, the statistical interpretation of such experiments is generally a case of the 
statistics of small samples, a subject which has been dealt with in a previous chapter 
(page 50). The modern designs for the lay-out of field experiments have already 
been explained and we have now to consider the available statistical methods which 
can be used in their interpretation. 

The principles underlying the statistical interpretation of field trials will be 
readily grasped by the ^tudent who has studied the chapter on the significance of 
means. All statistical determinations of significance are fundamentally based 
upon the table of the probability integral and involve a comparison of the observed 
differences between means with a function of the variance, either the standard 
deviation or the standard error or the probable error. The methods that have been 
used in the past and those which have been more elaborated recently may be 
summarized as follows : — 

(1) BesseVs method. This has already been described in some detail and consists 
in determining the means, standard deviations and the probable errors of the means 
of the yields of the two varieties under comparison ; and then determining the 
ratio of the difference of the two means to the probable error of the difference. 
If this ratio be more than 3-2 times the probable error of the difierence the 
results are said to be statistically significant. This method is reliable where 
there is no correlation between the difierent observations as in many chemical 
and physical determinations, but it is less reliable in its application to field trials 
where there may be a high correlation between the yielding powers of adjacent 
plots. 

(2) Student's method. This method requires the tabulation of difierences between 
the yields of two treatments and the determination of the ratio : — 

Mean difference 

— 2 — ■ ■■■ — ^ ^ 

Standard deviation of the difference 

From this ratio the odds are read off from tables which express the probability in 
terms of the value oi‘ z\ The most important condition of this method is that 
the lay-out should be such as to lend itself to the pairing of observations, e.g., yields 
of adjacent plots, and that there must be a number of replications to obtain signifi- 
cant results. The pairing of plots allows for the influence of the correlation between 
the yielding powers of adjacent areas. 
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(3) Fisher’s ‘ t This method differs from. Student’s method in that the standard 
deviation is determined from the degrees of freedom and a^^antity ‘ t which 
is the ratio of the mean difference to the standard error of tho^aifference, is used 
for the estimation of the significance. The table of ‘ ^ ’ is entered with the degrees 
of freedom and gives the probability that the observed value of " t ’ will occur as 
the result of chance errors. 

(4) Engledoio and Yule’s method. This is a method by which a number of 
varieties can be compared at the same time. The standard deviation of the difference 
of two sets of comparable plots from, their means is determined and the standard 
error of this is calculated by the formula 

<7 d 

....... (43) 

v" n 

The significance of the difference between two varieties is given by the ratio of the 
difference of their means to the standard error. If this ratio be greater than 2*1, 
the odds against the observed djfference being due solely to errors of random 
sampling are 30 : 1, and the result is considered to be significant. This method was 
elaborated as an extension of Student’s ‘ ’ and resembles the use of Fisher’s ‘ t ’. 

The calculation is rather cumbersome and the method lias been superseded 
by Fisher’s Analysis of Variance. 

(5) Fisher’s Analysis of Variance. This is the method utilized for interpreting 
the results obtained from randomized blocks and Latin Square methods of lay-out. 
According to this the total variation in the yields of plots is sub-divided into different 
parts representing : — 

(1) the effect of variety or treatment, 

(2) difforoncos in yield between diflhrent blocks, or different rows or columns, and 

(3) residual effect representing the random or uncontrolled variation of the 

experiment. 

Tlio residual variance furnishes the basis for the calculation of the experimental 
error. As the variance due to soil differences (due to blocks or to rows and columns) 
is eliminated in the analysis of variance, the residual variance, that is that duo 
to chance errors, furnishes a better criterion for estimating significance than the 
standard deviation calculated without any such elimination. Moreover, the 
variances are based on those degrees of freedom which contribute to the errors, 
prodiKicd l)y the various factors causing tlni variation in yield. 

A few actual examples of the different methods of lay-out and the application 
of the various statistical tests to their results will illustrate the principles described 
in the previous pages. 

G 
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.i YIELD TRIAL BY THE METHOD OF PAIRED PLOTS 

Example 18 

Purpose of experiment . — ^Varietal trial with gram. 

Field . — Botanical Section, No. 5. 

Varieties, — A and B. 

Lay-out . — Paired strips — A AB B A A B B 

No. of replications. — 6. 

Size of plot. — 84' X 13' = 1,092 sq. ft. 

Size of ultimate plots after removing the necessary borders . — 80' x 9' 


26-9 


25-62 


25-36 


32-29 


30-49 

26-39 


35-36 


23-32 


Fig. 1%.—Plan of a yield trial with paired 2 :)lols {Example 18) 








PLANT BE,EEPING AND AGRICULTURAL PROBLEMS. 


93 


Table XXIV 


Plot yields and calculations of Exa?nple 18 


Replication 

Yield of 

A 

Yield of 

B 

Difference 

d 

(B— A) 

(B — A)“ 

1 

2 

3 

4 

5 

1 

26-90 

34-59 

7-69 

59-1361 

2 

25-62 

32-80 

7-18 

51-5524 

3 

25-36 

32-29 

6-93 

48-0249 

4 

26-39 

30-49 

4-10 

16-8100 

5 

27-16 

34-34 

7-18 

51-5524 

6 

23-32 

.35-36 

12-04 

144-9616 

Total-s 

154-75 

199-87 

45-12 

372-0374 

Means .... 

25-79 

,33-31 

7-52 

•• 


In tlie above table tlie differences in the yields of paired plots are taken down 
in column 4 and in this ease all of them happen to be positive differences in favour of 
variety B. The sum of these differences divided by the number of pairs of plots 
gives the mean difference of the experiment. Column 5 shows the square of each 
of these differences. The sum of all these squares gives the value required 
for determining the standard deviation of the difference. 


Mean difference = ' t. =7-52 lb. 


Standard deviation of the difference, s == 








/ 372-0374 


(45-12)2 


Standard error 


6—1 

2-5588 


6 (6—1) 


n — 1 n (n — l) 

: 2-5588 


v/ 


n 


6 


= 1-044 


Fisher’s " 


Mean difference 7-52 

Standard error 1*044 


= 7-23 


Lxpected value of “ t ” from Fisher’s table, entering the table with n — 1 = 6 
For 1 per cent level of significance = 4-032 
and for 5 per cent level of significance = 2-571. 

G 2 
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The observed value of “ t ”, viz.^ 7-23 being b.igb.er than the expected values we 
conclude that the difference in the yields of varieties B and A is statistically signih- 
cant and hence B is significantly a higher yielder than 

o* ^ 

bince t = — 5~yr" 9-s shown above, 
the critical difference d == t X S.E., 

and for 1 per cent level d — 4-032 X 1*044 = 4-2094. As the observed difference 
of 7-62 lb. is greater than the expected critical difference of 4-2094, it is obvious 
again that the difference is statistically significant. 

Type A was an established variety of proved yielding power and type B was a 
new variety which was being tried against A. It is, therefore, convenient to 
express the critical difference and the mean difference between A and B as per- 
centages of the mean yield of the control, A. 

We obtain in this case critical percentage difference at the 1 per cent level 
critical difference x 100 4-209 

= Mean A ^ "25-79 ^ 

Similarly mean difference as a percentage of the mean yield of the control A is 

Mean difference x 100 7-52 

= X 100 = 29-16 % 

Mean A 25-79 

i.e., variety B has out-yielded variety A by 29 per cent and this percentage is 
significant since the critical percentage difference is only 16, 

Apptying Student’s ‘ z ’ test we obtain 

Standard deviation of the difference, <jd = ^ ^ =2-3358 

7-52 

Therefore, Student’s ‘ 2 ; ’ = ^ — 3-219 

’ 2-3358 

From Love’s Tables, we see that the odds against a difference of the magnitude of 
that observed being due to chance alone are 2499 : 1 when 2 ; = 3-2 and = 6, indi- 
cating that variety B is significantly superior to variety A in yielding power. 

Student’s ‘ z ’ may, as in the present instance, sometimes give very high odds 
but from a practical standpoint when using Student’s ‘ z ’ significance at 30 : 1 
or the stricter criterion of 100 : 1 is sufficient. 


A YIELD TRIAL BY THE ChESS-BOAED METHOD 

Example 19 

Purpose of experiment . — Varietal trial with barley, 

-Botanical Section, plot 1 — A. 

Tarieties . — Barley Types 21 (A), 20 (B), 14 (D) and local (C). 
Lay-out. — Chess-board. 
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Tlie significance of tlie differences in yielding power between the four types can 
be readily determined by Engledow and Yule’s metliod [Shaw and Bose, 1929]. 
This method allows the comparison of all the types which are included in the experi- 
ment. Table XXV gives the yields of all plots, the mean yield of each type and 
the deviation of each plot yield from the mean of the type. Table XXVI shows the 
difierences (d) between these deviations, and the squares (d^) of these difierences, 
for ail pairs of contiguous plots in each comparison. Thus the deviation of the 
yield of plot 1, Type A, from the mean yield of all plots of Type .4 is 137*3, and 
similarly for plot 2, Type jB, the deviation from the mean yield for all B plots is 
124*3 ; the difference “ d ” in this case is, therefore, 137*3 — 124*3 == 13 and “ 
is 169. The value of “ d^ ” is calculated in this way for all possible pairs of contigu- 
ous plots. In this experiment there are six possible comparisons of types. 


A with B 
B with G 
C with D 
A with C 
A with D 
and B with D 


with 25 pairs in each comparison. There are, therefore, 150, i.e., 25 x 6, values of 
“ ” and the standard deviation of the difference is obtained by the formula 


c d = 



The standard error 




is given by 


or d 
n 


where n is the number of plots 


under each type, in this case 25. 
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Table XXV 


Calculation of statistical significance hy Engledow and Yule's Method in Example 19 



Yields in 

grins 


Deviations from the mean 

A 

B 

o 

1) 

A 

1 

B 

0 

' I, 

462 

385 

270 

526 

137-3 

124-3 

145-5 

249-5 

. 352 

540 

210 

318 

37-3 

285-3 

85-5 

41-6 

460 

385 

217 

335 

135-3 

124-3 

92-5 

58-5 

455 

328 

227 

386 

140-3 

67-3 

I 102-5 

109-5 

372 

325 

202 

395 

57-3 

64-3 

77-5 

118-5 

262 

315 

113 

157 

— 52-7 

54-3 

-11-5 

—119-5 

275 

183 

95 

123 

— 39-7 

—77-7 

-29-5 

-153-5 

332 

127 

121 

240 

17-3 

-133-7 

-3-5 

—36-5 

252 

102 

85 

228 

— 62-7 

-158-7 

-39-5 

-48-5 

442 

204 

62 

195 

127-3 

-56-7 

-62-5 

-81-5 

204 

182 

108 

214 

-110-7 

-78-7 

—16-5 

-62-5 

208 

58 

210 

233 

—106-7 

-202-7 

85-5 

-43-5 

280 

248 

66 

295 

— 34-7 

—12-7 

—58-5 

18-5 

278 

261 

60 

255 

-36-7 

0-3 

—64-5 

—21-5 

201 

370 

35 

168 

-113-7 

109-3 

-89-5 

-108-5 

231 

212 

39 

243 

—83-7 

-48-7 

-86-6 

—33-5 

313 

173 

84 

282 

—1-7 

-87-7 

-40-5 

5-5 

181 

200 

22 

265 

—133-7 

—60-7 

-102-5 

—11-5 

312 

256 

72 

187 

2-7 

-4-7 

-62-5 

-89-5 

182 

168 

172 

178 

-132-7 

-102-7 

47-5 

-98-5 

472 

445 

222 

318 

157-3 

184-3 

97-5 

41-5 

380 

123 

135 

459 

65-3 

—137-7 

10-5 

182-5 

316 

310 

56 

253 

1-3 

49-3 

—68-5 

-23-5 

358 

288 

122 

312 

43-3 

27-3 

-2-5 

35-5 

308 

333 

108 

348 

-6-7 

72-3 

-16-5 

71 5 

Mean 314-7 

i 

260-7 

124-5 

276-5 

« m 

• • 
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S/ = 273353-00 + 295709-40 + 150824-00 + 143674-00 + 166856-00 


+ 363647-40 = 1384063-80 
ad - 




1384063-80 




a d 


150 

96-06 


v/'; 


. 96-06 
19-212 


n 


The “ significance ” of the difference between any two types is given by the 
ratio of the difference of their means to the standard error (S. E.). The ratio for 
each comparison is as follows : — 

314-7 — 124-5 190-2 

A. and V = ■ 


B and C 
J) and C ■■ 
A and B 
B and ,D 
A and B 


19-212 

260-7 124-5 

__ 

276-5 — 124-5 
19-212 

314-7 -- 260-7 
19-212 

260-7 — 276-5 
' 19-212 

314-7 ■— 276-5 


19-212 

136-2 

19-212 

152-0 

19-^2 

54-0 

19-2"l2 

-—15-8 

19-212 

38-2 


9-90 
7*09 
7*91 
: 2*81 
= — 0*82 
= 1*99 


19-212 19-212 

When this ratio is greater than 2*1, the chances of superiority of one tyjic over the 
other are over 30 : 1 . 

Thus the superiority of varieties A, B, and B over C, and of A over B are sta- 
tistically significant in tlie order given, according to tlie magnitude of the ratio, but 
tlie diffV'.rences Ixitwocvn A. and and B and J>, are not significant as the ratios of 
the mean difhvronce to the standard error of the cxperiinont arc only 1-99 and —0-82 
res|»ectively and are much hiSs than the expected ratio of 2*1. 

In the original pa.pcr [Shaw and Bose, 1929] on yield trials with barley these 
results are also tested ])y Bessel’s mctliod and by Student’s ‘ z 


A YIlfiLD TRIAL BY Til 15 METHOD 0:C’ RANDOMIZED BLOCKS 

Extmvple 20 

Purpose of experiment . — Varietal trial witli gram. 

Fidd . — Botanical Section, No. 6. 

Varieties. — A. (control), B, O and I). 

llandomizcd blocks. 

No. of replication.s. — 10. 

Size of plot.— no' X 16' = 2,080 sq. ft. 
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Size of ultimate plots after cutting out the necessary borders . — 125' X 12 

Block Plot No. Type Yield in lb. 


c 76-0 


44-5 


6 


59-0 


47-0 


74*0 


770 


5 


8 


•0 


-5 


43-0 


67-5 


•5 


1 7 

C 

62*0 

™iT“ 




•0 


850 


40-0 


•0 


50*5 


87*0 


5*5 


29 

0 

74*0 

30 

c 

94-5 


B5-5 


83-5 


80-5 


830 


•0 


770 


9 


8 I 0 


63*5 


40 


64-5 

57*5 


Fig. 18 . — Flan of yield tried toith randomized blocks {Exam-pie 20) 
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Table XXVII 


Calculation of differences and their squares {Example 20) 


Block 

Bipferenoes from assumed mean of 70 lb. 

PER PLOT 

Block 

Totals 

Squares of 
block 
totals 

A 

B 

C 

D 

1 

2 

3 

4 

5 

6 

7 

1 

—19-0 

-25-5 

-2-5 

—11-0 

-58-0 

3364-00 

2 

—12-5 

-23-0 

-1-7-0 

— 4 * 0 

-24-5 

600-25 

3 

—5-5 

-27-0 

-f 10-0 

+3-0 

-19-5 

380-25 

4 

—2-5 

—21-5 

-1-12-0 

-1-1-5 

-10-5 

110-25 

r> 

H-S-0 

-30-0 

-1-15-0 

+ 14-5 

+ 7-5 

56-25 

0 

-hi -5 

—21-0 

-1-7-0 

—3-0 

—15-5 

240-25 

7 

— 4-5 

—11-5 

-1-17-5 

-1-4-0 

-1-5-5 

30-25 

8 

+ 10-5 


1 

1 

q 24-5 

-1 13-5 

+44-0 

1936-00 

9 

+ 21-0 

-1-7-0 

-J-31-0 

+ 12-5 

+71-0 

5041-00 

10 

— 6-5 

—5-5 

-1-11-0 

+4-0 

-1-3-0 

9-00 

VATltETAL 

Totals 

--9-5 

-162-5 

+ 132-5 

-1-42-5 

+3-0 

General 

total. 

11767-50 

Squares of varie- 
tal totals. 

90-25 

26406-25 

17556-25 

1800-25 

■ 

Summation 

varietal 

of squares of 
totals. 
45859-00 


Total sum of squares of all differences in columns 2 to 5 = 8464-5 
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In tKe above table, differences in tbe yields of each plot from an assumed general 
mean of 70 lb. are set up for each variety in each plot. These differences are then 
added up horizontally to obtain total differences per block and vertically to get the 
total differences per variety. These totals are then squared and summed up so as 
to yield total sums of squares for blocks and for varieties. As these differences are 
taken from an assumed and not from the true mean, a correction is to be applied. 
This correction factor is obtained by squaring the general total at the bottom of 
column 6 (+ 3-0 in this case) and dividing it by the total number of plots in the 
experiment. To obtain true sums of squares due to (1) blocks, (2) varieties and (3) 
the total of the whole experiment, the correction factor must always be deducted 
from the crude sums. The total sum of squares can be obtained by squaring each 
individual difference and summing them together. So that we get : — 


Correction factor = 


( 3 ^ 

40 


0-225 


Sum of squares due to blocks (divided by 
the number of varieties) 


11767-50 

^ — 0-225 = 2941-650 

/I • 


Sum of squares due to varieties (divided by 
the number of blocks) 


45859 

10 




Total sum of squares = 8464-500 — 0-225 = 8464-275. 


Analysis of Vaeiance 

The figures obtained above are now set up in a table of analysis of variance and 
divided by their appropriate number of degrees of freedom so as to yield a measure 
of variance (mean square) for each item. Since there are 40 plots to be considered 
there are 40 — 1 or 39 total degrees of freedom. There being 10 blocks and 4 varieties 
in the experiment the degrees of freedom for blocks is 10 — 1 or 9 and for varieties 
it is 4 — 1 or 3. The degrees of freedom unaccounted for must be due to errors and 
are in this case 39 — (9 fi- 3) = 27. The sum of squares due to error — total sum 
of squares — (sum of squares due to blocks + sum of squares due to varieties). 

Table XXVIII 
Analysis of Variance 

Degrees Sum 

Due to of of Mean square or variance 

freedom squares 


Blocks • . . . 9 2941*650 326*850 

Varieties .... 3 4585*675 1528-658 Fi 

Error * • . • 27 936*950 34 702 * . . . . Vo 


Total . 39 8464*275 
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SiGNIEICANCE 

The above table of analysis of variance expresses the total variance divided 
into three different components. In the first instance we have the variance due to 
blocks. Since each block contains the same 4 varieties and is of the same size the 
variability of the yields of blocks is an expression of the variation in fertility (soil- 
heterogeneity) in the experimental field. The variance due to varieties, on the other 
hand, should be an expression of the inherent differences in yielding power between 
varieties ; since each variety is distributed at random about the field in 10 different 
plots, this random distribution is designed to neutralize the effects of soil-hetero- 
geneity. The balance left after deducting the variance due to blocks and the 
variance due to varieties from the total variance of the experiment is the variance 
due to the chance errors of experiment, and this latter quantity furnishes a criterion 
for measuring the significance of the experimental results. If the variance due to 
varieties is significantly greater than the variance due to error or blocks, then obvi- 
ously the inherent differences between the yielding powers of the varieties has been 
the dominating factor in the experiment. If the variance due to blocks is greater 
than that due to varieties then the chief variable in the experiment has been the 
soil-heterogeneity. If the variance due to error is about equal to or larger than the 
variance due to varieties then the diflerenccs between the yielding powers of varieties 
have not been significant. 

Bearing this explanation in mind it can easily be understood that the comparison 
of the variance due to varieties with that due to error, in other words, the ratio 

CO 

-- — : will determine the significance of the experiment. 


Fisher’s ‘ z ’ test 

Fislier estimates significance by taking half the logarithms to the base e (see 
appendix 1 ) of tlie: variances which arc to be compared and subtracting the value of 
half log,, for error from tliat for treatinents. Thus in this experiment 

7-3321 


3-5467 


X log„ Fj == } log,, 1528-55f 
i log, F, = ^ log, 34-702 


1-7733 


Difference 


1-8927 


By talcing -1 log„, we, of course, obtain the logarithm to the l)a.se e of the sejuare root 
of tlic variance, tliat is of tlio standard deviation. The number thus obtained is 
now compared witli the value of ‘ z’ in Fisher’s I’able of ‘ z ’ for the appropriate 

y ^ jL. ^ "P 

degrees of freedom. Fisher’s Table of ‘ z ’ gives tlie dlotri - bution of t h^^Hrafio I — 1 

F2 

at the P = 0-01 and 0-05 levels of significance ; Fisher’s ‘ z ’ must carefully be 
distinguished from Student’s ‘ z ’. In the present example the degrees of freedom 
for varieties are 3 and for error are 27, Entering the ‘ z ’ table on the 1 per cent 
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level of significance with = 3 and = 27, we find z = 0-7631. The calculated 
value of ‘ 2 : ’ is, however, 1-8927 which is much higher than the value required 
(0*7631) for significance at the 1 per cent level. We infer, therefore, that the 
differences in the yielding powers of varieties are of such a size as would not occur 
due to chance alone once in a hundred trials. 


MaHALANOBIS’ ‘ X ’ TEST 


With a view to simplifying calculations and avoiding the use of logarithms to 
the base e, Mahalanobis has published auxiliary tables of Fisher’s ‘ 2 : ’ in which the 


ratio 


F. 


Fa ~ 


2 “ has been determined for degrees of freedom up to % = 60 

and n == oc, and is called x. In our example, 

Fi 1528*558 


X 


F, 


34*702 


44*05 


The expected value of x from Mahalanobis’ Tables is 5*488 for = 3 and = 27 
for the 1 per cent level of significance. 

Our observed value of Z=44*06 being much greater than the expected value of 
6*488, we can definitely conclude that the variance due to the varieties is 
significant and that the four varieties in the experiment show significant differences 
in their yielding powers. 


Critical difference 


To obtain an idea of the comparative yielding power of each variety, we prepare 
a table of differences of mean yields and determine which of the differences are 
greater or less than the critical difference of the experiment. 

The critical difference is determined from Fisher’s formula t — — ^ 

S.E, 

from which d= t X 8.E. 

The standard error of the difference can then be determined from the variance 
due to residual errors (see Table XXVIII) and the value of “ £ ” for any level of 
probability appropriate to the degrees of freedom for error can be obtained from 
the table of “ « The critical difference is obtained from the product of these two 
values. 

The standard error of the difference is calculated as follows : — 

Variance due to error = 34*702 


Variance for mean values of 10 replications = 


34*702 

10 ~ 


and 


the standard error of the difference between 


^ 34*702 X 2 


= 2*634 


any two such values 


10 
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Now for 27 degrees of freedom the values of “ t ” in Fisher’s Table are : — 
2-771 for 0-01 level of significance 
and 2-052 for 0-05 level of significance. 

The critical value, d, in this experiment 
= 2-771 X 2-634 or 7-2988 for 0-01 level of significance 
and 2-052 X 2-634 or 5-4060 for 0-05 level of significance. 

Table XXIX 


Table of differences between mean yields of varieties in lbs. 


Varieties 

Jl 

(control) 

B 

o 

D 

Mean 
yields 
in lb. 

Percentage 
diflerence 
of mean 
yields from 
control (.4) 

1 

2 

3 

4 

5 

6 

7 

A 


—15,30 

-h 14-20 

+5-20 

69-05 

Percentage. 

B 

-1 15-30 


+29-60 

+20-50 

53-75 

-22-16 

G 

-14-20 

—29-50 

• • 

-9-00 

83-25 

+ 20-56 

D 

-5-20 

—20-50 

+9-00 


74-25 

+7-63 

in yielding poAVor 

in 

IV 

i; 

II 


• * 


Positivo difici-cjncos wlvicli iiro statistically sigiiiticaiit at .P — . 0-01 Itsvol arc i^rintod in antique. 

Critical diiTorcnco at P. “ 0-01 Icvtd . . . . . — 7-2988 

Critical difference at F =-■ O-Ol level as percentage of moan yield 

7-2988 X 100 

of A or — — “fjTjTo5~~ 10-57 percent. 

In the last column (7) percentage differences larger than 10-57 are signi- 
ficant. 

In the above talfie differences lietwccn the yields of any two varieties which 
are as large as or larger than the critical difference should be taken to be statis- 
tically significant. For the 1 per cent and the 5 per cent levels of significance, 
the critical differences in tins example arc 7-2988 and 5-4050 respectively. So 
that a difference as large, as 5-405 lbs. or larger will occur five times in 100 trials as 
the result of chance errors of experiment and a difference of 7-2988 or larger will 
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occur only once in 100 trials as tlie result of cliance errors. Statisticians have 
adopted these two levels of significance, on the ground that if a particular event 
occurs, by chance, only five times in a hundred trials, it may be expected that in 
the remaining 96 cases, the event will occur due to the inherent property of the 
variate under consideration. Similarly a more rigid test is the 1 per cent level 
which shows that the event may occur only once in a hundred trials by chance and 
in the remaining 99 trials it will occur because of an inherent property of the treat- 
ment to give such a result. 

For the sake of convenience, positive differences greater than the critical differ- 
ence may be underlined to show that these differences are statistically significant. 
In the present case it will be noted that the mean yield of variety C is Significantly 
superior to the mean yields of all other varieties even at the 1 per cent level. Simi- 
larly the mean yield of variety B is significantly inferior to the yields of all other 
varieties at the 1 per cent level of significance. On the other hand, the mean yield 
of variety D is significantly superior to variety B and inferior to variety C, and 
although the yield of this variety is higher than that of variety A, the difference 
of 6-20 lbs. in their mean yields is not statistically significant. 

The mean yields of each variety are ranked according to their comparative 
merits. In this experiment Type A was an old established strain [Type 17] of 
known yielding power and the main object was to determine whether any of the 
other types were superior or inferior to Type A. 

The differences between each type and Type A, the control, may be conveniently 
expressed as percentages of the mean yield of the control and compared with the 
critical difference also expressed as a percentage of the mean yield of the control. 
This has been done in column 7 of Table XXIX. 

Fisher’s s and Mahalanobis’ x are tests of significance of an experiment 
as a whole and before proceeding to make comparisons of individual yields some 
such test of significance must be apphed. A test of significance in an experiment 
tells us whether the variance produced by treatments is significantly higher than 
the variance due to residual error. It may happen that mean differences greater 
than the critical difference will occur in experiments which arc not indicated as 
significant by the z ox x tests, and Fisher and Wishart [1930] warn workers that 
such mean differences are not to be taken as significant. 


A YIELD TRIAL BY THE LatIN SqUARE MeTIIOD 

Example 21 

Puffose of experiment . — Preliminary varietal trial with wheat. 
Piolcl , — ^Botanical Section, No. 10. 

Varieties.— A, B, C, D, E, F, G, and H (control). 

Lay-out . — ^Latin square, 8x8. 

Size of plots. — 18’ X 18^ = 324 sq. ft. 



ROWS 


PLANT BREEDING AND AGRICULTURAL PROBLEMS. 


107 


Size of ultimate plots after removing the necessary borders. X 14:' — 190 
1 

sq. ft. = acre approximately. 

222 

The plots in this experiment are small owing to the number of types which had 
to be handled at this stage of the plant-breeding work on wheat ; the example^ 
however, serves to illustrate the statistical principles involved. 



Fig, 19. — Fla7i of wheat yield trial in a Latin Square, 


It will be noted that each variety has been so randomized as to occur once only 
in the direction of the rows and once only in that of the columui. The yield per 
plot is recorded in oimces. 


H 
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Table XXX shows the deviations of plot yields from an assumed mean of 90 
ounces. An assumed mean is taken in order to avoid big decimal figures and a 
final correction for this is made later on. 

Table XXX 


Deviations of plot yields from an assumed mean of 90 oz. 


Rows 






Columns 




Total deviation 
for rows 




■ 

2 

3 

4 

5 

6 

7 

8 


I . 



+31 

—10 

—1 

0 

—3 

—9 

—2 

_25 

—19 

2 



—8 

—16 

+46 

—14 

—11 

—16 

—24 

—15 

—58 

3 . 



+ 6 

—12 

—34 

—16 

— 4 

0 

—22 

+ 10 

—72 

4 



—6 

+4 

—2 

+58 

+23 

—15 

—20 

-1-19 

+ 61 

6 . 



—15 

—21 

+6 

—1 

+ 18 

+53 

+ 13 

+ 7 

+60 

6 . 



—27 

—23 

—22 

+35 

+66 

—4 

—6 

—2 

+ 17 

7 . 



—31 

+40 

—19 

—10 

+ 10 

—5 

+ 1 

+22 

+8 

8 . 



—3 

—16 

—22 

—9 

_3 

+9 

+46 

+ 14 

+ 16 

Total deviations 
columns 

for 

—53 

— 54 

—48 

+43 

+96 

+ 13 

—14 

+30 

+ 13 

General total 


To obtain an estimate of the total variance in the experiment the deviations 
of plot yields are squared and summed together as shown below : — 


Table XXXI 


Squares of demotions given in Table XXX 


Rows 

Columns 

Total sum. of 
squares 



1 

2 

3 

4 

5 

6 

7 

8 


1 



961 

100 

1 

0 

9 

81 

4 

625 

1781 

2 . 



64 

256 

2116 

196 

121 

256 

576 

225 

3810 

3 . 



36 

144 

1156 

256 

16 

0 

484 

1 00 

2192 

4 



36 

16 

4 

3364 

529 

225 

400 

361 

4935 

5 



225 

441 

36 

1 

324 

2809 

169 

40 

4054 

6 



729 

529 

484 

1225 

4356 

16 

36 

4 

7379 

7 



961 

1600 

361 

100 

100 

25 

1 

484 

3632 

8 



0 

256 

484 

81 

9 

81 

2116 

1 i)6 

3232 












General total sum 












of squares 

Total sum of 

squares 

3021 

3342 

4642 

5223 

5464 

3493 

3786 

1 

2044 ^ 

;iioi5 


Total sum of squares from assumed mean = 31015 

„ „ ,, „ ,5 true mean = 31015 — correction factor* 

= 31015 — 2-64 = 31012-36 


* JSee naee 110. 
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Similarly tlie deviations of plot yields for each variety are summed up for 
the replications per variety and tabulated as shown below : — 


Table XXXII 


Deviation of plot yields for each variety frmn assumed mean 


Replications 

Varieties 

Total 

A 

B 

G 

D 

E 

E 

Q 

H 

cloviations 
for rows 

1 . 

0 

—9 

+31 

—10 

—1 

—3 

—25 

—2 

—19 

2 . 

—8 

—11 

+46 

—24 

— 15 

—16 

—14 

—16 

58 

3 . 

0 

—34 

+ 10 

+6 

—16 

—12 

—22 

—4 

—72 

4 . 

2 

—20 

-1-58 

+23 

+4 

—6 

—15 

+ 19 

+61 

5 . 

+ 18 

—1 

+ 53 

+ 6 

+ 13 

-1-7 

—21 

—15 

+60 

0 . 

—23 

—2 

+66 

+35 

— 4 

-6 

—27 

—22 

+ 17 

7 . 

+ 1 

—31 

+40 

+22 

+ 10 

—10 

—19 

—5 

+8 

8 . 

+ 14 

— 1() 

+46 

+ 9 

\\ 

22 

—3 

—9 

+ 16 

Tot-al devia- 
tion for 

varietioH 

0 

■ 

-1-350 

+ (>7 

—12 


—146 

—54 

+ 13 
General 
total. 

Mean devia- 
tion 

0 

—15-50 

-1-43-75 

-1-8-375 

—1-50 

—8-50 

—18-25 

—6-75 


Moan yield 

90-00 

74-50 

133-75 

98-38 

88-50 

81-50 

71-75 

83-25 

• • 


H 2 
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Now the sums of squares for rows, columns and varieties are calculated as shown 
helow : — 


Table XXXIII 


Sum of squares of deviations for rows, columns and varieties 


Replications 

Rows 

Columns 

Varieties 

d 


d 

d^ 

d 

d^ 

1 , . . . 

—19 

361 

—63 

2809 

0 

0 

2 . - . . 

—58 

3364 

—54 

2916 

—124 

16367 

3 . . . . 

—72 

5184 

—48 

2304 

-1-360 

122600 

4 . , . . 

4-61 

3721 

4-43 

1849 

4-67 

4489 

5 . , , , 

-j- 60 

3600 

4-96 

9216 

—12 

144 

6 . . . . 

4-17 

289 

-1-13 

169 

—68 

4624 

7 ... . 

4-8 

64 

—14 

196 

—146 

21316 

$ ... . 

4-16 

256 

""1-30 

900 

-54 

2916 

Total 

• • 

16839 


20369 

• • 

171366 

Divided by 8 to get mean 
values 

• • 

2104-88 

• • 

2544-88 

• * 

21420-62 

Subtract correction 


2'64 

• * 

2-64 


2-64 

True sum of squares 


2102-24 


2542-24 

• • 

21417-98 


The total deviations for rows, columns and varieties are obtained by squaring 
each deviation and summing. These totals are then divided by the number of 
replications and we get average sums of squares. As we arbitrarily took an assumed 
mean of 90 oz. a correction has now to be applied to obtain the true sums of squares. 
This correction is equal to the square of the general total [Table XXXII] divided 
by the number of plots in the experiment. Thus — 

Correction factor = ■■■-.-.L = 2*64 

64 


Analysis of vaktance 

Now the most important table, that of the analysis of variance, has to be set 
up. This table gives the complete information that can be gathered from the 
experiment. The total variation in the experiment can be divided into four parts 
in which there are 64 — 1 or 63 degrees of freedom. Eor rows, columns, and varie- 
ties each there are 8 — 1 = 7 degrees of freedom. The degrees of freedom for error 
= total degrees of freedom — (sum of degrees of freedom for rows, columns and 
varieties). In this case, it is 63 — (7 -f 7 7) = 42. Similarly the sum of squares 

for error is equal to the difference between the total sum of squares and the sum- 
mation of the sums of squares for rows, columns, and varieties. The mean squares 
or variance for each of these groups of variables can be determined by dividing 
the sums of squares by their appropriate degrees of freedom, 
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Table XXXIV 


Analysis of variance 


Duo to 

Degrees of 
freedom 

Sum of 
squares 

Mean squares 
of variance 

I log.o Mean 
squares for 
Fisher’s 2 -test 

* ^1 
Ratio of — for 
^2 

Mahalanobis’ 
a:- test 

1 

2 

3 

i 

4 

6 

6 

Rows .... 
Columns 

Varieties 

Error .... 

Total 

7 

7 

7 

42 

2102-24 

2542-24 

21417-98 

4949-90 

300-32 

363-18 

30S9-71(vi) 

117-86(v2) 

2-86236 

2-94748 

4-01337 

2-3848 

3069-71 
117-86 
= 25-963 

63 

31012-36 


• • 

• * 


Columns 1 to 4 in the above table are essential and either column 5 or 6 may 
be retained according to whether the significance o£ the result is to be determined 
by Fisher’s original 2 :*-test or by its auxiliary-test, the ic-test, recently brought out 
by Mahalanobis. 

Significance 

The same argument which was followed in the case of the randomized blocts 
is now used in the interpretation of the Latin Square. If the mean square or vari- 
ance due to varieties or treatments is much greater than the mean square due to 
“ error ” it may be concluded that the differences between the varieties or treat- 
ments are significant. Similarly if the variance due to rows or to columns be very 
much greater than the variance due to “ error ”, large variations in soil-fertility 
are obviously present. The advantage of the Latin Square lay-out lies in the fact 
that it affords an elimination of soil-heterogeneity in two directions at right angles 
to each other, i.e., in rows and in columns. In practice, therefore^ we compare 
the variance due to treatments with the variance (^ 2 ) due to error. This gives 
a more precise measure of the significance of the treatments than the variance of 
the whole experiment since the variability due to differences in soil fertility has 
been eliminated. 

Fisher’s z-test 

In our example, ni == 7 and % = 42 and the observed value of 2 : = 4*01337 
— 2*3848 = 1*6286 ; the value of z from Fisher’s tables for W], = 6 and % ~ 30 
(near about the true degrees of freedom)* is 

0*6226 for the 1 per cent level (P = 0*01) 
and 0*4420 „ „ 5 „ „ (P = 0*05). 

^ Fisher’s ; 2 :-table ends at = 30 ; by taking n^, ~ 30 wo are adopting an even stricter crite- 
rion than if we took the actual value of z for = 42. 
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Tte observed value being greater than tbe stricter criterion at the 1 per cent level 
it is concluded that there is a statistical difference between the different varieties 
in the experimenti, 


Mahalanobis’ oj-test 


Observed Value of x = — = = 26*963 

117*85 

For WjL = 6 and = 30 
Expected value of x — 3*474 for P = 0*01 
and X = 2*421 for P = 0*05 


The observed ratio of 26*963 being greater than the expected ratio of 3*474 at the 
P = 0*01 level it is concluded that the differences between the varieties are statis- 
tically significant. 

Having established that there are significant differences in the yielding powers 
of different varieties, the varieties may be compared amongst themselves. This, 
of course, is done by comparing the mean differences with the critical difference 
of the experiment. Details regarding the calculation of tlie critical difference as 
well as the tabulation of differences in the mean yields of varieties have been given 
in the case of randomized blocks and need not be repeated liere. In this case we 
get — 

Variance due to error = 117*86 


Variance for mean values of 8 replications = 


117*85 

8 


and standard error of difference between any two sucb / 117*85x2 

= 5*43. 

values — V 8 

The value of “ t ” for 42 degrees of freedom is not given in Fislier’s “ t ’’-table but 
can easily be interpolated, or by taking the value given for 30 degrees of freedom 
we can work with a stricter criterion. 


For 30 degrees of freedom the value of “ t ” in Fisher’s Tables is 

2*750 for 0*01 level of significance and 2*042 for 0*05 level of significance. 

/. The critical value d = t x S. JE. ~ 2*760 x 5*43 = 14*9325 for 0*01 level 
of significance 

and d = 2*042 x 5*43 = 11*0880 for 0*05 level of significance. 

Differences greater than these two critical differences for the 0*01 and 0*05 levels in 
the table shown below are statistically significant. Those, which fall below 11*0880 
must be considered as not significant. 
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In tlie last column percentage differences greater tlan are significani 
In tlie table, differences greater tban critical difference for 1 per cent level are 
marked in antiques and differences greater tlan critical difference for 5 per cent 
level are marked in italics. Tie results are evident from a mere inspection of 
tie above table, 


Mster, E. A, (1932). Statistical MeWs for Eesearcl Workers, 224-227. 

and Wiskart, J. (1930). The Arrangement 6f Field Experiments and the 

Statistical Eeduction of the Eesults. Imp. Eur, of Soil Seme, M. Con, 10, 15. 

Love, H. H. (1924), A Modification of “ Student’s ” z-Table for Use in Interpreting Experi- 
mental Eesults. Jour, Imer. Soe. Afoti, 16, 68-73. 

Mahalanobis, P.C.(1932), Statistical Notes for Eesearcl Workers, No. 3, Auxiliary Tables 
for Fisher’s z-test in Analysis of variance. Ini Jour, Afi, Sci., 2, 686-689. 

Shaw, F, J, F.. and Bose, E. D. (1929). Yield Trials with Some Pusa Barleys, Afi, Jmr, 
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CHAPTEK X 

STATISTICAL INTERPRETATION OF COMPLEX AND SERIAL 

EXPERIMENTS 

Complex expekiments 

The experiments detailed above and their interpretation by Fisher’s analysis 
of variance are confined to the investigation of only one variable factor- In 
examples 20 and 21 we are concerned with the study of the yielding power of a 
few varieties of gram and wheat, and in such experiments all the factors except 
one, the kind of seed sown, are the same for the different plots. The land for such 
an experiment is so chosen that the soil variability is as small as possible and all 
plots are cultivated in the same manner. In short, within the limits of field condi- 
tions, the different plots of the experiment have only one changing factor ; such 
an experiment, in which only one variable changes, is termed a simple experiment. 
It sometimes happens that it is desirable to design an experiment to test two, throe 
or more variable factors. In such cases if we resort to the simple method we have 
to perform a number of experiments each like the one detailed in example 20 or 21. 
The space, the amount of labour and the expense involved in carrying out these 
experiments are not commensurate with the advantage gained and to avoid such 
difiiculties, a complex experiment either a Latin Square or a randomized block, 
involving the number of factors in question is laid out. Such an experiment in 
which we are concerned with the study of a number of factors simultaneously is 
known as a complex experiment. 

An elementary example of a complex experiment is furnished by the lay-out 
of a field trial involving three varieties, G^, and Gg, and three manures Mi, 
and Mjj. The nine possible combinations resulting from the use of three varieties 
and three manures are shown below and can be arranged in a 9 X 9 Latin Square 


Arrangement Number 

Variety 


Manilre 

1 

G. 1 

and 

M. 1 

2 

G. 2 

3 ? 

M. 1 

3 

G. 2 


M. 2 

4 

G. 3 

5 1 

M. 2 

5 

C. 1 


M. 2 

t> 

G. 3 


M. 3 

7 

G. 3 

5 ? 

M. 1 

8 

G. 2 

33 

M. 3 

9 

G. 1 


M, 3 
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The arrangement of treatments in the Latin Square can he as follows : — 


Table XXXVI 


3 

9 

2 

6 

1 

8 

4 

7 

6 

4 

5 

7 

6 

8 

2 

1 

9 

3 

8 

7 

1 

2 

G 

4 

9 

3 

5 

1 

2 

3 

7 

4 

9 

6 

5 

8 

7 

4 

8 

3 

5 

1 

2 

6 

9 

2 

1 

6 

9 

3 

5 

7 

8 

4 

5 , 

3 

9 

1 

7 

6 

8 

4 

2 

9 

6 

4 

8 

2 

3 

.5 

1 

7 

6 

8 

5 

4 

9 

j 

7 

3 

2 

1 


In this Latin Square each variety occurs once with eacli manure in eacli row and 
in each column. In this example there are only two factors under consideration. 
Now if we have types of wheat, manures and % methods of cultivation 
we proceed just as in the previous case, hut there will then be X n^^Xn.^ 
combinations and these would be arranged in a (Hj X »2 X I..atin Square. If 
the product niXn. 2 Xns is large, the square of this becomes still larger, and conse- 
quently the size of the held required for such an experiment will l)e very large. In 
such cases it is better, therefore, to adopt the method of randomized blocks with 
about ten replications. 

Exam'ple 22— 

As an example of a oomplex experiment we may consider an. experiment on the 
effect of four fungicides on the incidence of bunt {Tilletia indica) in three varieties 
of wheat using two methods of infection [Mitra, 1935]. In such an experiment 
there are three variables — fungicides, varieties of wheat and metliods of infection. 

The fungicides used were copper carbonate, ceresan, formalin and uspuhm. 
These four fungicides together with an untreated control make np five treatments. 

The three varieties of wheat wore — 

Pusa types 111, 112 and 113. 

The two methods of infection were — 

A — naturally infected seed, and 

B — naturally infected seed plus a dose of artificial infection. 

Further details of the mycological side of this experiment will l)e found in the 
author’s original paper. 

The lay-out was a randomized block with 8 replications, iflach block containctl 
30 plots, 15 under infection series A and 15 under infection series B ; 5 treatments 
and 3 varieties in each series. At the time of harvest all the ears in each plot wor('- 
counted and the number of ears showing bunt noted down. The amounts of bunt 
in each plot are given as percentages of the total number of ears in each plot and 
are shown in Table XXXVII. 
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Tlie statistical analysis of tlie data is done by Fisber’s analysis of variance.. 
Here tbe total variance is analysed into variances dne to blocks^^nfections, varieties^ 
interactions between the combinations of the three factors and the residual variance. 
The crux of the problem is to find out the variances due to the several variables. 
The total sum of squares is calculated as in the ordinary simple case. The variable 
squared method of calculation being used, the correction factor to be applied is 
(270-44)2 

— — or 304-740. The total sum of squares equals the sum of squares of the 
percentage infections of different plots minus the correction factor : — 


(0-94)2 + (0-80)2 ^ (0-85)2 + . . . . + (0-27)2 ff70-44)2 

240 

= 938*896 — 304-740 == 634-156. 


Since there are 30 observations in a block (15 under each method of infection) 
the sum of squares of the block totals is to be divided by 30 and therefore the sum 
of squares between blocks is equal to l/30th of the sum of squares of the block 
totals (ignoring effects of variety and treatment) minus the correction factor. Thus, 


l/30th (32-46)2 + (29-40)2 + (37-73)2 + 
9241-3362 


30 


304-74 = 3-305. 


-I- (33-8)2 _ 304-740 


The tables given below enable us to determine the sums of squares between treat- 
ments, varieties, infections and interactions between any two of these variables. 


Table XXXVIll 


Treatments x Infections 


— 

T. 1 

T. 2 

T. 3 

T. 4 

T. 5 

Totals. 

A 

60-36 

12-66 

14-66 

21-16 

8-56 

107-29 

B 

116-69 

9-19 

11-80 

17-20 

8-37 

163-16 

Total 

166-96 

21-84 

26-36 

38-36 

16-93 

270-44 


The first value, 50-36, in the first row and column is derived from the summation 
of the total percentage for 3 wheats under Treatment 1 (T.l), i.e., 6-79 +22-05-1-21-52 
60-36. 
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Table XXXIX 
Treatments x Varieties 



T. 1 

T. 2 

T. 3 

T. 4 

T. 6 

Totals 

p. Ill 

46-03 

7-11 

6-23 

11-46 

3-69 

74-42 

P. 112 

76-02 

10-08 

15-49 

16-17 

5-64 

123-40 

P. 113 

44-90 

4-65 

4-64 

10-73 

7-70 

72-62 

Total 

166-96 

21-84 

26-36 

38-36 

16-93 

270-44 


Tlie figure 4:6'03 is the summation of all the percentage figures for P. Ill under 
control, (T. 1). 


Table XL 


Varieties X Infections 



P. Ill 

P. 112 

P. 113 

Totals. 

A 


26-13 

47-60 

34-66 

107-29 

B 

- 

49-29 

76-90 

37-96 

163-16 

Total 


74-42 

123-40 

72-62 

270-44 


The figure, 26*13, is derived from the summation of the percentage for P. Ill under 
all the treatments in Scries A, viz.^ 

6*79 H- 4*72 -h 2*59 + 9*97 + 1*06 = 25*13 
Sum of squares due to treatments 

Sum of squares of treatment totals n v / 

48 i.e., the number of plots for each treatment ' ' factor) 

_ (166*95)2 (21-84)2 + (26*36)2 + (38*36)2 -|- (16*93)2 

’ ' r- - . — 

30802*2522 


48 


304*740 = 336*974 with 4 degrees of freedopi, 
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The sum of squares due to infections 

Sum of squares of infection totals 

120 {i.e., number of plots for each infection) 
_ (107-29)^ + (163-15)^ 

120 

= 13*002, the degree of freedom being 1. 


Interaction between treatments and infections is equal to the sum of squares 
due to treatments X infections — (Sum of squares due to treatments -j- Sum of 
squares due to infections). Sum of squares for combined treatments x infections 
= 1/24 {(50*36)2 (12*65)2 -f (14*56)2 (21-16)2 (8-56)2 4. (116-59)2 

+ (9*19)2 (11.80)2 + (17*20)2 + (8*3®)2} — 304*740 = 429*093. 

Hence the interaction between treatments x infections = 429*093 — 336*974 
— 13*002 = 79*117 with (9 — 4 — 1 or 4) degrees of freedom. 

Tables XXXIX and XL enable us to calculate the sum of squares between 
varieties and also the interactions between varieties X treatments and varieties X 
infections. 

Sum of squares between varieties 


304*740 


effects) 


= (74*42)2 •4-((23*40)2 (72-62)2 

80 {i.e., the number of plots) 

= 20*7545 under each variety. 

Total sum of squares for treatments x varieties (ignoring infection and block 
(46*03)2 -p (7*11)2 _|_ (7*70)2 


16 {i.e., 8 blocks with 2 infections) 


-304*740 


10999*746 . , , , , 

= — 304*740 in each block 

16 

= 382*747 


Interaction between treatments x varieties — Total sum of squares — (Sum 
of squares for treatments Sum of squares for varieties) having (14 — 4 — V) = B 
degrees of freedom, i.e., equal to 382*747 — 20*7545 — 336*974 = 25*019. 

To find interaction between varieties and infections, we make use of Table XL. 
Interaction between varieties and infections is equal to total sum of squares - -(Sum 
of squares due to varieties -1- sum of squares due to infections). 

Sum of squares for combined varieties X infections 

; (25-13)2 + (47*50)2 (34-66)2 -f (49*29)2 (75.90)2 _|_ (37*96)2] 

40 {i.e., 5 treatments and 8 blocks) 

= 38*2675 


Interaction = 38*2675 — 20*7545 — 13*002 

“ 4*511© having (5—2 — 1 or 2) degrees of freedom. 

Finally, there remains to be determined the interactions between the tliree vari- 
ables. 
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This is equal to the total sum of squares (ignoring block effect) minus sum of 
variances due to treatments, varieties, infections and interactions between the 
three quantities taken two at a time, having (29 — 4 — 1 — 2 — 4 — 8 — 2 or 8) degrees 
of freedom. 

Total sum of squares for treatments x varieties x infections (excluding block 
effect) 


(6-79)2 H- (22-05)2 „[_ (21-52)2 _|_ (4.72)'. 


+ (3-70)2 


- 304-740 = 520-612. 


8 {i.e., number of blocks) 

Therefore, interaction between treatments X infections X varieties 
= 520-612 — (336-974 + 13-002 + 20-755 -|- 79-117 + 25-019 + 4-51) 

= 41‘234: with 8 degrees of freedom. 

The difference between the total sum of squares and the sum of all the other 
sums of squares gives the residual, having (239 — 7 — 29 or 203) degrees of freedom. 
The final analysis of variance may now be tabidated as follows ; — • 


Table XLI 


Analysis of Variance 


Dvif to 

r>r‘grr(*es of 
freedom 

Sinn of 
.squares 

Mean 

Square 

Observed 
value of 
Malialanobis’ 

X 

Critical 
value of 
Malialanobis^ 
(P = 0-01) 

iilockn 

7 

3-305 

0*472 

* m 

% m 

TroatiiicTits (Control and funpcieideB) 

4 

33()*974 

84-244 

155-144 

.3-320 

Infections (A A B) .... 

1 

13-002 

1.3-002 

23-940 

C-635 

ViirieWos (Wlifiat types P. Ill, P. 112 and 
P. 113). 

2 

20- 7.5.1 

10-378 

19-111 

4-605 

IntoracllonB : 






’^rrejitrncntH X lnf<‘<*tlons 

4 

70-117 

10-780 

36-420 

3-320 

'rrealnienis x Varii'tie.s 

8 

25-010 

3-127 ' 

5-760 

2-511 

x Varieties 

<•> 

4-51 1 

2-255 

4-153 

4-005 

Treatments >, Varieties >; Infections - 

8 

4 1-234 

5-154 

0-402 

2-511 ■ 

H(‘si{|iial error , . . . . 

2o;i 

1 10*230 

()-r)43 



'I’oTAb 

230 

(i;U • 1 .Ki 





It will be seen from the above table that the calculated value of x exceeds the 
theoretical value given in Mahalanobis’ taldes at 1 per cent level of significance in all 
items except blocks, showing thereby that the differences between the several treat- 
ments, infections and varieties are statistically significant. Further, it may be 
noted that the interactions between tlie vai-ious condnnations of the three factors 
under consideration are also significant. 

After having established tliat there are significant differences in percentage 
attack by bunt on treatments, infections and varieties, let us now compare the 
efl'ect of the fungal attack within every one of the factors under discussion. 
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To compare and to axrange tlie five treatments, tlie two infections and tlie tliree 
varieties in order of merit in respect of bunt attack, we have first to calculate tbe 
critical differences for treatments, infections and varieties. Then a table of differ- 
ences between the mean percentage of attack due to bunt is drawn up for every 
one of the factors under study. 

With the help of the critical differences for the respective factors these tables 
will enable us to judge the merits of (1) the various treatments, (2) the different 
kinds of infections and (3) different varieties used for the experiment. 

Table XLII gives the differences between the means of the different treatments, 
control, copper carbonate, ceresan, formalin and uspulun. The critical differences 
for treatments, etc., have been calculated and given below the respective tables. 

Table XLII 


Table of differences between the mean values for treatments 


Treatments 

T. 1 

T. 2 

T. 3 

T. 4 

T. 6 

T. 1 

• • 

— 3-023 

— 2-929 

— 2-679 

— 3-126 

T. 2 

+ 3-023 

• • 

-f- 0-094 

0-344 

— 0-102 

T. 3 

-f 2-929 

— 0-094 

• • 

-I- 0-260 

-0-196 

T. ^ • . . • . 

-f 2-679 

— 0-344 

— 0-260 

• • 

— 0-440 

T. 5 

-f 3 -125 

-h 0-102 

+ 0-196 



Mean ..... 

3-478 

0-466 

0-549 

0-799 

0-363 


Critical difference = Jf ^ ^ X t = 0-304 (P = 0-06) 


» » = 0-401 (P = 0-01) 

A comparison of the various figures of the table with the critical differences leads 
us to the following conclusions : — 

That T. 1 (control) differs from all other treatments in having significantly 
higher percentage of bunt infection. 

That T. 4 (formalin) has significantly higher infection than T. 2 (copper car- 
bonate) at P = 0-05 and T. 5 (uspulun) at the higher level, P = 0-01. 

Or, in other words, we may conclude that 

T, 6 (uspulun) is superior to T. 4 (formalin) at the 1 per cent level ; 

T. 2 (copper carbonate) is superior to T. 4 (formalin) at the 5 per cent level ; 

T. 2 (copper carbonate), T. 3 (ceresan), T. 4 (formalin) and T. 5 (uspulun) are 
superior to T. 1 (control or no disinfectant) at the 1 per cent level, 

There is no significant difference between other combinations. 
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In Table XLIII, we get the percentage differences of bunt infection between the 
means of the different varieties P. Ill, P, 112 and P. 113. 

Table XLIII 


Table of differevices between the mean values for varieties 


Varities 

P. Ill 

P. 112 

P. 113 

p. Ill . 

• 

• 

• 

• 

• 

• 

• 

. • 

+ 0*613 

— 0-022 

P. 112 . 

• 

« 

• 

• 

• 

• 

• 

— 0-613 


— 0-636 

P. 113 . 

* 

• 

• 

• 

• 

• 

• 

-h 0-022 

+ 0-535 

* tt 

Mean 

• 

• 

- 

• 

• 

• 

• 

0-930 

1-643 

0-908 


Critical difference = ^ x t = 0*229 (P = 0*05) 

„ „ = 0*303 (P = 0*01) 

Prom the table it is evident that P, 112 is inferior to P. Ill and P. 113 at the 
1 per cent level in respect of susceptibility to bunt. 

The comparison of infections is quite easy in this particular case, for we have 
only two kinds of infections. 

Mean oi A . . . . . • 0*893 

Mean of P . . . . . . 1*360 

Mean difference ..... 0*467 

Critical difference = / ^ - x t =0*247 (P = 0*01), 

120 

which shows that infection is significantly higher in the B series than in the A 
series. 

Serial experiments 

In the case of complex experiments we have seen that an experiment can test 
the effect of more than one variable, such as the combined effect of different manures 
and different varieties. It has been previously mentioned that conclusions based 
upon a single experiment conducted in one season may not be supported when that 
experiment is repeated in another season or under different climatic conditions. 
It is quite possible, for instance, that different treatments may respond differentially 
when subjected to varying climatic conditions and for this reason a repetition of 
the experiment in at least three successive seasons is desirable. An investigation 
which is repeated in several successive seasons is termed a Serial Experiment 
and the accumulated results of such successive experiments are combined to give a 
more exact evaluation of the different treatments. 

Example 23 . — The oat trial at Pusa furnishes an excellent example of the utihty 
of such accumulated data in performing a Serial Trial [Bose, 193^. 

I 
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Eleven hybrid oats, A, B, G, D, E, F, 6, H, I, J and K, which were evolved 
at Pusa by hybridizing Scotch Potato oats with two Indian oats [Shaw and Bose, 
IQSSg], were selected for comparison with two established high-yielding Pnsa selec- 
tions, B. S. 1 (marked L in the following pages) and B. S. 2 (marked M). Details 
regarding B. S. 1 and 2 have already been published [Shaw and Bose, 1933^]. 

In a preliminary trial at Pusa conducted in 1930-31, small areas of eq^ual sizes, 
under a large number of oats, were harvested from duplicate plots of each type. 
The oat hybrids and selections enumerated above showed great promise and in 
table 1, the average yields and the rank attained by each type are given. 

Table XLIV 


Mean yields of Pusa oats obtained from duplicate plots each measuring 50' X 8' 


Yields and rank 

’ Vakieties 

1 

A 

B 

0 

B 

E 

F 

G 

H 

I 

1 

J 

K 

L 

(B. S. 1) 

III 

(B. S. 2) 

Yields in lb. 

11-55 

13-05 

15-36 

15-15 

15-35 

14-55 


12-75 


13-85 

].2-9() 

11-75 

11-45 

Hank 

12 

7 

1 

3 

2 

4 

5 

10 

9 

6 

8 

11 

13 


In the following year, 1931-32, randomized blocks with five replications of each 
type were laid out simultaneously at Pusa and Karnal with these thirteen oats. 
The usual soil and cultural conditions required for oats in this country were given 
and soils of average fertility, without the application of any manures, were used. 
The main variable factor in the two localities was that at Pusa the crop was grown 
under harani conditions, i.e., without any irrigation, wliereas at Karnal tlie crop 
usually received two to three irrigations. The yields of these thirteen types of 
oats taken from five plots, each 1,000 square feet in area, obtained at Pusa and at 
Karnal are recorded with their respective merits in the Table XLV. 

Table XLV 


Total yields of Pusa oats tahen from five plots, each 1,000 square feet in area (1931-32) 



Yields 

and 

rank 






Varieties 








) 


B 

C 

B 

E 

F 

a 

H 

I 

J 

K 

L 

M 

Pusa . 

f 

1 

Yields in 
lb. 

127-5 

123-5 

122-0 

131-5 


174-0 

151-0 

111-0 

116-5 

150-() 

136-0 

221-5 

161-0 



Bank 

9 

10 

11 

S 

5 

2 

G 

13 

12 

4 

7 

1 

3 

Karnal J 


Yields in 
lb. 

206-5 

146*5 

210-5 

203-0 

■ 

172-0 

218-5 

182-5 

104-0 

221-5 

197-0 

150-0 

178-0 

i 


Bank 

4 

13 

3 

5 

8 


2 

7 

11 

1 

6 

12 

9 
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By comparing Tables XLIV and XLV, it will be noted tliat tbe yielding powers of 
tlie different oats at Pusa in 1931-32 and 1930-31 were not at all similar. Whereas 
L (B. S. 1) ranked as the highest yielder in the latter year, it was indeed a poor 
yielder in the former season. Under Karnal conditions, however, this very type 
of oats ranked very low in 1931-32, a result which was in conformity with that 
obtained at Pusa in 1930-31. It is evident from Table XLV that if recommendations 
had been based on the results of the Pusa trial conducted during this single season, 
we should certainly speak very highly of L (B. S. 1) and reject hybrid C for 
its poor yields. If the recommendations, on the other hand, were based on Karnal 
results of 1931-32 we should form just the opposite view. This proves definitely 
how erroneous it would be to base conclusions on the experience of trials conducted 
in one 'season or in one locality. 

This yield trial experiment was, therefore, repeated in these two localities in the 
rahi seasons of 1931-32, 1932-33, and 1933-34, and Table XLVI shows the yields 
per variety obtained from plots 1,000 square feet in area. The size of the plot 
varied somewhat from year to year, and from locality to locality, depending on 
the size of the field available for the experiment. To maintain uniformity the plot 
yields in the table under reference have been calculated on the basis of areas of 
1,000 square feet, which is roughly equal to l/44th of an acre. 


It will be noted from the Table XLVI that the variables under consideration 
are : — 

(1) Localities — ^Pusa and Karnal. 

(2) Seasons— 1931-32, 1932-33, and 1933-34. 

(3) Varieties — ^Hybrids A to K and types B. S. 1 {L) and B. S. 2 {M) oats. 

(4) Blocks — 5 replications of each type. 

The method of obtaining the analysis of variance is the same as is employed in 
the case of complex experiments. In the present example the variable-squared 
method has been used in determining the sums of squares. 

The grand total of the whole experiment being 16,309 and there being 390 plots 
in the whole series of experiments, the correction factor (c. f.) is equal to 

(16309)^ ^ 682008*9256, 

390 

a sum which has to be deducted from all crude sums of squares to enable us to get 
the true sum of squares for each particular item. 

The total sum of squares for the whole experiment as shown in the last column of 
Table XLVI is the total of these figures, 734690 minus 682008*9256 (c. f.) or 
52681*0744. 

I 2 



Yields of Pusa oats in lbs. from 'plots 1,000 square feet in area 



Yabietal totals rOR Pusa , . • 534-0 445*0 757-0 606*5 656-5 652*5 703*5 623-5 639*0 723*0 651-0 780*5 766*0 8,338*0 
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SiDXi of squares due to varieties, ignoring the (‘fleets of all otlu^r variai»l(‘s, is the 
sum of squares of total yields of eiush variety divided l)y tlu', numlxT of plots* eon- 
tributing to each total, minus the correction factor 


(1140-6)*^ -f (878*0)2 -f (1424*5)2 + .... -f- (1484*0)2 

¥o 


c.f. == ri207*1744 
(see bottom of Table 


Similarly sum of squares duti to seasons 
(4326*5)2 + (5860*0)2 (6122*0)2 


€.f. 14475*2784 

(SCO bottom of Table XLVllI). 


Sum of squares due to localities 
(8338)2 q. (7()7i)2 


195 


c.f. = 345*3564 (sec bottom of 3’al)le XLVII) 


and finally the sum of squares due to IjIgcjIch 
(3196*0)2 (3236*5)2 1 .... + (3281*0)2 

'Hw* •«•**•'*** ■ ^ ‘""***“"**“ ^ j(' ^ J ^ II I I 

7o 

(also see botfom of 4'able XIjVH), 

Having determined the sums of sciuaixjs for th(‘- individual it4‘m8 shown above*, 
it resmains for us to caloidate the interactions l)c4AV(‘en any two of tlu'se* variables 
taken togestlicr and finally the int(‘ractions Ixd.ween a.ny thi’<‘(‘. of th(‘s<‘. items csom- 
bined together. The* sum of S(|uar(‘S eoatriludcd by all th(*se eomponi'nts <deduete<l 
from the total sum of squares will ulf inial (dy yitdd the sum <»rs< juar(‘s for t lu' rt'sidual 
error. 

]NT.K,aA(!'J'K)NS 


The combiiuid effects of any two varialdes taken al a, time enables us to study 
whetluiT these have or liavc! rud, induenetMl tin*. yi«*Ids more thu.n would result from tin* 
simple, additive (‘{feeds of tiu'se, two components, d'hus in 'fabh* .XIA’II we set* 
that the sum of sc|uareH din' to the eornbined elfects ol’ bhadcs and loealities is 
equal to 1278*3544, which is iimeli greater than the additive etfeet. of these two 
variables as rerpreisemted by their sums of ,s(juaa‘t*H, /.c., 197*771 1 j .‘Ure.'hotM or 


>4 


1308. Tilts suggests tha,t the interaid.ion of liloeks ami loealities has exerti’d 


a 


definite influence on the variability of t lie e,x|)erim(‘nt . 

The interaeitm hetwiam any two variables such as blocks and loealit ies is indicated 
liy blocks X localities and is calenlat(‘<l as follows : 

Interaction between blocks >< loealit ies S. S. (blocks : loealit ies) (S. S. 
blocks -f- iS, B. localities), w'here 8. S. represents the sum of squart's of a {lartieular 


^ In dt^iornuning huiuh dI Hijuaron oI anj variahlr. ihv A<|uaraH ni' individual aru luldfal 

liogoiiier and aiwaya (livided whii'li in (M|ual U> tha nundinr of plot v whivh inaka up I hr rf 

bucK totals and fruni tliin, of cuuraoj the eurroetion factor in dnduvtnd. 
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Interactions between two variables taken at a time are shown in Tables XLYII 
toLll. 


Tablj.] XLYII 
Oomidned blocks X localities 


iUork H 

LocalitioH 


PuHa 

Karnal 

Block totals 

I . . . 



1707-r> 

1488*1> 


2 


* 

1711 -5 

152r>*() 

.5236-C 

5 . . . 


• 

l(.2U-a 


! 32;n(>r. 

4 . . . 


- 

in7H-(> 

1081 -a 


r> . . . 


• 

ir> 2 1-a 

l(!()0-0 


td9*alii^\ totals . 


• 

Ka.'ts-o 

7S>71-(> 

1((3Q{)'0 


^ I in I'ueli locality in tlu'- altove tabli* liave bec'ii derived by adding 
tog(‘t;hci‘ block yields of t he three siMisonH in eacli locality. ''J,’'1 iuh the first block yield 
for Pusa, vh., 1797*5 ~ 4(5{>*() + 615*5 -h 62(5*() lbs. 


Total sum of .s(|UJireH for blocks X localities 


(1707'5)-“ I (MH<S-5)“ 1 

.Sum of .s<(tiarcs Ibr blocks 

(:n'itcO)“ i {52:><rr))“ ; 

78’" 

.Sum oT .s<|uai'cs for localitic.s 
(8:i:kS-())“ j (7*»71-0)“ 

i‘)5 


(IbbO-O)- 


(3.r. - 


{528i'())‘ 


cJ‘. 


155059{)Hr)*(K) 




2()()4H204*0() 


5>9 

1278*:3544. 

55212122*59 

78”"'“"“ 

197*7744. 

c.r. 545*5564 


c.t. 


c.l. 


Interaction between bhadis X localities 
=-■ S. S. blocks X iocaliti(^s — (.S. B. blocks - j- B. B. localities) 
1278*5544 ~ (197*7741 -|- 545*3564) 755*2256 
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Table XLVIII shows the combined effect of blocks x seasons. 


Table XLVIII 


Combined blocks x seasons 



Seasons 

Block totals 

1931-32 

1932-33 

1933-34 

• • • 

991-5 

1104-5 

1100-0 

3196-0 

^ • 

950-0 

1129-0 

1167-5 

3236-6 

3 . . . 

803-5 

1203-5 

1229-6 

3236-5 

4 . . . 

802-0 

1239-6 

1317-6 

3369-0 

6 . . . 

779-5 

1184-0 

1317-6 

3281-0 

Seasonal totals . 

4326-5 

6860-6 

6122-0 

16309-0 


The block yields per season in this table have been obtained by summing the 
block yields in the two localities for each season. Thus 991-5 = 466-0 + 525-5. 

Total sum of squares for blocks X seasons 


(991-5)2 _j_ (1104-5)2 _j_ _ _ (1317-5)2 

26 


c,f. = 


18196287-50 


26 

= 17848-2844 


c.f. 


Sum of squares for blocks as abeady determined = 197-7744 


Sum of squares for seasons = 


(4326-5)2 + (5860-5)2 + (6122-0)2 
130 


90542946-50 

130 


Interaction between blocks X seasons 


14475-2784 


== S. S. blocks X seasons — (S. S. blocks + S. S. seasons) 

= 17848-2844 — (197-7744 + 14475-2784) = 3175-2316 
The combined effect of blocks x varieties is shown in Table XLIX. 














Oonibined blocJcs X varieties 
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Varietal totals . 1140-5 SV8-0 1424-5 1282 0 1266-5 1221-5 1391-0 1070'0 1197-5 1423-6 1142-0 1437-5 1484-6 16309-0 
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The block yields per variety in the above table have been obtained by sninming 
together the plot yields of each variety, in each block, in both the localities, in 
all three seasons. Thus 216’0 in the first row and first column is equal to 26*5 + 
37*5 + 37-5 + 44*5 + 29*0 + 40-0. 

Total sum of squares for blocks x varieties 


(216*0)2 + (174*0)2 + + (300*5)2 4173698*00 

- g — c.f. = g 

= 13607*4077 

Sum of squares for blocks = 197*7744 
Sum of squares for varieties 

_ ^40*5)2 + (878*0)2 + . . . + (1484*5)2 20829183*00 

30 “ = 30 

= 12297*1744 

Interaction between blocks x varieties 
= S. S. blocks X varieties — (S. S. varieties + S. S. blocks) 

= 13607*4077 — (197*7744 -f 12297*1744) = 1112*4589 


c.f. 


— c.f. 


The combined effect of seasons x localities is shown in the following table : 


Table L 

Ccnnhined seasons X localities 


Localities 

«> 

1931-32 

Seasons 

1932-33 

1933-34 

Locality totals 

Pusa 

1889-0 

3048-5 

3400-5 

8338-0 

Kamal 

2437-5 

2812-0 

2721-5 

7971-0 

Seasonal totals • 

4326*5 

1 

5860-5 

6122-0 

16309-0 


The seasonal yields in the table represent the total yields of all varieties per 
season in each locality. 

Total sum of squares for seasons x localities 


(1889*0)2 d- (3048*5)2 + . . . , + (2721*5)2 45680386*00 

66 “ = 65 “ 

= 20766-2444 

Sums of squares for seasons and localities are 14475*2784 and 346’3564 respectively. 
.*. Interaction between seasons and locahties 
= S. S. seasons X localities — (S. S. seasons S. S. localities) 

= 20766*2444 — (14475*2784 -f 345*3564) = 595i5*6096 

In a like maimer the combined effects of varieties x seasons are tabulated 
on page 133. 










Combined, varieties X seasons 



Varietal totals . . . 1140-5 878-0 1424-5 1232-0 1266-5 1221-6 1391-0 1070-0 1197-5 1423-5 1142-0 1487-5 1484-5 16309-0 
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Tlie yields per variety in tliis table bave been obtained by combining togetkeT 
tbe total jdelds of eacb variety secured from botb the localities per season. 

Total sum of squares for varieties x seasons 


(334*0)2-1- (370*0)2 + , _ (572-0)2 

10 


c.f. — 


7126204*00 


10 


c.f. = 30611*4744 


Sums of squares for varieties and seasons 


= 12297*1744 and 14476*2784 respectively. 

.*. Interaction between varieties X seasons 
= S. S. varieties X seasons — (S. S. varieties + S. S. seasons) 

= 30611*4744 — (12297*1744 + 14475*2784) = 3839*0216. 

MnaUy, tbe effect of combined varieties X localities is brought out in Table LII 
shown on page 135. 

Tbe varietal yields represented in tbe above table have been obtained by summing 
together all tbe plot yields, under each variety in tbe three seasons, under con- 
sideration at Pusa and Karnal respectively. 

Total sum of squares for varieties X locabties 


(634*0)2 + (445*0)2 + . . . + (718*6)-2 10439484*50 

= — —— c.f. = 77 —c.f. = 13956*7077. 

15 15 

Sums of squares for varieties and locabties are 12297*1744 and 346*3564 respectively. 
Interaction between varieties X locabties 

= S. S. varieties X locabties — (S. S. varieties + S. S. locabties) 

== 13956*7077 — (12297*1744 + 346*3564) = 1314*1769. 

Interactions between tbe undermentioned variables have thus been determined : — 

(1) Blocks X locabties 

(2) Blocks X seasons 

(3) Blocks X varieties 

(4) Seasons X localities 

(5) Varieties X seasons 
and (6) Varieties X locabties. 


Now tbe second order interactions, or tbe interactions between three variables 
taken at a time, are to be calculated. They are interactions between : — 

(1) Locabties X seasons X varieties 

(2) Localities X seasons X blocks 

(3) Locabties X blocks X variet'cs 
and (4) Seasons x blocks x varieties. 

Table LIII shows tbe combined effect of locabties X seasons X varieties. 

























CamUned localities x seasons X varieties {excluding block effect) 
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The varietal totals in this table have been secured by summing together the plot 
yields of each variety in all the Jive blocks in each experiment. 

Total sum of squares for localities X seasons X varieties 

(127-5)2 + (123-5)2 -f • . . + (253-0)2 ^ 3616745-00 

= c.l. = — C.I. 

5 5 

= 41340-0744 

Sums of squares for localities, seasons and varieties are 346*3564, 14476*2784 
and 1297*1744 respectively. 

Interactions between seasons X localities, varieties X seasons and varieties X 
localities have also been calculated in previous tables and are 6965*6086, 3839*0216 
and 1314*1769 respectively. 

Interaction between localities x seasons X varieties 

= S. S. localities X seasons X varieties — (S. S. localities -4- S. S. seasons 
S. S. varieties + interactions between seasons X localities 4* varieties 
X seasons + varieties X localities) 

== 41340*0744 ^ (345*3564 4-' ^ 14475-27M -J- 12297^^^^^^^^^ 6946*609.6_i- 

3839*0216 1314* 1769) == 3123-4571 

In Table LIV, tlie coml)ined effects of localities X seasons X blocks are 
demonstrated. 


Table HV 


Go'mbined localities X seasons X blocks {excluding varietal effect) 


Localities 

SonsoiiH I 

Blooks 

Seasonal 

totals 

Locality 

totals 

1 

2 

3 

4 

5 

i 

Piisa 

1931-82 . 

400*0 

434 •() 

330-5 

m 

337*() 

1889*0 



1932-38 . 

015-5 

580*0 

593*0 

630-0 

624*0 

3048*5 



1933-34 . 

020*0 

091*5 

0!)0-5 

732-5 

600*0 

8400-5 

8338-0 

Karnal 

1931-32 . 

525*5 

510*0 

‘l()7-{) 

480-5 

442-5 

2437-5 



1932-33 . 

489*0 

543*0 

010-5 

000-5 

560-0 

2812-0 



1033-31 . 

474*0 

460*0 

530-0 

585*0 

657-5 

2721-5 

7971-0 

Block totals 

3iy«-o 

3236-6 

1 

3236-5 

3359*0 

3281*0 

16309-0 

10309*0 
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Tke Hock yields in tkis table have been obtained by sununating tbe plot yields 
of aU tke varieties in eack season and in eacb locality. 

Total sum of squares for localities X seasons X blocks 


(466-0)2 + (434-0)2 + . . . + (667-5)2 


13 


c.f. = 


9202648 


c.f. 


13 

= 25887-0744. 

Sums of squares for localities, seasons and blocks are 345-3564, 14475-2784 and 
197-7744 respectively. 

Interactions between blocks x localities, seasons X localities and blocks X 
seasons are equal to 735-2236. 5945-6096 and 3175-2316 respectively. 

Interaction between localities X seasons x blocks 

= S. S. localities X seasons X blocks — (S. S. localities -j- S. S. seasons 
S. S. blocks -{- interactions between blocks x localities + seasons X 
localities 4- blocks x seasons) 

=25887-0744 — (345-3564 H-^4475-2784 + 197-7744 4- 735-2236 -f 6945-60 
+ 3175-2316 ) = 10 12 ^1004. ^ ~ ^ 

The combined effect of localities X blocks X varieties is represented in Table 
LV given on page 139. 

The varietal yields in this table have been obtained by combining the plot yields 

of each variety in the three seasons for each locality. Thus 101-5 = 26-6 4- 37-6 4- 
37-5. -r -r 

Total sum of squares for localities x blocks x varieties 

(101-5)^ + (90-5)2 4- . . . + (151-1)2 2096725-00 


c.f. = 


c.f. 


3 — — 3 

= 16899-4077. 

Sums of squares for localities, blocks and varieties are 345-3564, 197-7744 and 
12297-1744 respectively. 

Again interactions between blocks X localities, blocks X varieties and varieties X 
locahties are 735-2236, 1112-4589 and 1314-1769 respectively. 

.*. Interaction between localities x blocks x varieties 

= S. S. localities X blocks x varieties — (S. S. localities 4- S. S. blocks 4- 
S. S. varieties -f interactions between blocks X localities 4- blocks x 
varieties 4~ varieties X localities) 

= 16899-4077 — - (345-3564 -f 197-7744 -f 12297-1744 4- 735-2236 4- 
1112-4589 4 1314-1769) == 897-2451 

interaction between seasons x blocks X varieties is tabulated 

(Table LVI)^ 



Combined localities x blocks X varieties {excluding seasonal effeet) 
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Combined seasons X blocks x varieties {excluding locality effects) 
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Varietal .. 1140-5 878-0 1424-5 1232-0 1266-5 1221-5 1391-0 1070-0 1197-5 1423-5 1142-0 1437-6 1484-5 16309-0 16309-0 

totals 
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The varietal totals have been obtained in the above table 
the plot yields from the two localities per block per season. 

Total sum of squares for seasons X blocks X varieties 

(71-0)2 _|_ (64.0)2 + . . . + (118.5)2 ^ 1438694-00 

_ _ _ _ 

= 37338-0744 

Sums of squares for seasons, blocks and varieties are 14475-2784, 197-7744 and 
12297-1744 respectively. 

Interactions between blocks x seasons, blocks X varieties and varieties X seasons 
are respectively 3175-1216, 1112-4689 and 3839-0216. 

.*. Interaction between seasons X blocks X varieties 

= S. S. seasons X blocks X varieties — (S. S. seasons + S. S. blocks + S. S. 
varieties -1- interactions between blocks X seasons -j- blocks X varieties 
-1- varieties x seasons) 

== 37338-0744 — (14475-2784 + 197-7744 + 12297-1744 + 3175-2316 
-f 1112-4589 + 3839-0216) = 2241-1351 

Sums of squares due to the residual error is equal to the difference between the 
total sum of squares for the experiment and the total of all the other sums of 
squares. 

Degrees of freedom. — Altogether there are 390 plots in the whole experiment and 
hence the total degrees of freedom are 390 — 1 or 389. These could be divided 
further as shown in column 2 of Table LVII. add. But since the several 

blocks in different seasons 
and localities do not corres- 
pond, the block sum of 


The analysis of variance is given in Table LVII. 

Table LVII 
Analysis of variance 


Due to 

Degrees of 

Sums of 

Mean 

freedom 

squares 

squares 

1 

2 

3 

4 

1. Blocks 

4 

197-7744 

40-4436 

2. Localities 

1 

345-3504 

345-3564 

;L Heasoiis 

2 

14475-2784 

7237-0392 

•L Varieties 

12 

12207-1744 

1024-7045 

lni(u*aeti(,>iiB. 




5. Blocdcs X Ifxuilities . . - . 

4 

735-2236 

183*8050 

(). Blocks X H(.^ason,8 .... 

8 

3176-2316 

300-9039 

7. Blocks X vjiri(^ti(^H .... 

48 

1112-4580 

23-1702 

2972:804^ 

8. Seasons x localities .... 

*> 

50.|^)-60«0 

3830-0216 

i>. VarioticB >: seasons .... 

24 

150-0502 

10. Va.ri<‘ti(^ft x localities .... 

12 

1314-1769 

3m-459l 

109-5147 

,11. Localiticis x scsisona x variidlrs 

24 


12. Localities x se-asous X blocks . 

8 

10|I2-(I()114 


1;L Localities x l>lo<;ks x vari(ities 

48 

807-2451 

18-6!)26 ! 

14. Seasons x bloc-ks x varieties 

J 5. Ilesidnal (UTor 

m 

90 

2241-1361 

lO^SJlK) 

23-3452 

20-5139 

Total 

080 

52081-0744 

135-4269 


MahaI 


squares is the sum of items 
1} 5, 6 and 12, giving 24 
degrees of freedom for 
‘ blocks similarly residual 
sum of squares for error 
will be the sum of items 
52-81 6J 7, 13, 14 and 15. This 
49-954( will obviously alter the cal- 
8-9601 culations in columns 4, 5 
19-3480 Sind 6 and the conclusions 
1-1298 drawn therefrom in the 


Observed 

2-410 

16-8351 


7-7976 

5- 3386 

6- 3442 


2-511 
4-905 
1-000 


1-1380 


The mean squares in this table have been calculated for each item by dividing 
the respective sums of squares by their appropriate degrees of freedom. The 

K 2 
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statistical significance of each mean square is determined by Malialanobis’ aj-test by 
taking the ratio of each mean square to the mean square for residual error. This 


, Vi . . ... 

observed ratio |y- is shown in column 5 and the expected critical ratios from Malia - 

lanobis’ tables for the P = 0*01 level are placed against each in column 6. 

It will be noted that the mean square for blocks is not significant at the one per 
cent level but is significant at the lower level, P — 0*05. The mean square for tiie 
interaction, localities X blocks X varieties, is not significant, while all the other 
variances are definitely significant. This shows that we have, by eliminating the 
effects of different items, definitely brought about an improvement in the precision 
of the experiment. The mean square for error without any eliminations (total error) 
is 135*4269 whereas the residual error after elimination of other effects is only 
20*6178, so that the improvement in precision obtained by using this method of 


analysis is 


136*4269 

= 6*67 times. 

20*6178 


Table LVIII shows the improvement in precision obtained by reducing the experi- 
mental error of the whole experiment by eliminating various items from this. The 
mean square for error, without any eliminations of such effects as those brought 
about by blocks, seasons, etc., is 135*4269. It is reduced to 68-5552 if effects due to 
blocks, localities, seasons and varieties are excluded. If a further elimination of 
all interactions of the first order or those between any two variables taken at a time, 
is also made, the mean square error comes down to 33*9477. But if the interactions 
of the second order, or those between three variables at a time are also taken into 
account in addition to the above, as has ])eeii done in the arialysis of variance shown 
in Table LVII, the variance for error is finally reduced to 20-6178. The improve- 
ment in precision in each case is shown in the last column of 'fable LVIII. 


Table LVIII 


Comparison of experimental errors after the elimination of different effects 


Errors 

! 

Begreos of 
freedom 

Bums of 
squares 

Mean 

squares 

Katio of mean 
sq uare error 
( l) to other 
mean squares 

Improvement 
in j)recision 

(1) Without elimination of any effects such 
as those due to blocks, etc. 

i 389 

52681-0744 

135-12G9 



(2) After eliminating effects due to blocks, 
localities, seasons, and varieties. 

370 

25365-4008 

68*5552 

135-4269 

68-6562 

1-98 times. 

(3) After eliminating effects due to (2) and 
also aU interactions taken two at a 
time. 

272 

9233-7096 

” 33-9477 

136-4261) 

33-9477 

135-4269 

3-98 times. 

(4) After eliminating effects due to (3) and 
also all interactions taken three at a 
time. 

96 i 

1979-3099 

20-6178 

20-0178 

6-57 times. 


Having estabhshed that there are significant differences in the yielding power 
of the oats in — 

(1) the two localities 

(2) the three seasons 

and (3) the thirteen varieties under trial, 
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we have now to draw up tables of mean differences for these variables and compare 
the possible pairs of differences with the help of critical differences for the respective 
factors under consideration. 

The residual variance is 20*6178. Therefore the standard error of the difference 
for each comparison is obtained by the square root of this number divided by n, 
the number of plots contributing to each total, and multiplying this quantity by 

v/i 

The critical difference, as usual, is obtained by multiplying the standard error 
of the difference by the value of ' i ’ given in Fisher’s tables for the 0*01 or 0*05 
levels of significance. 

Table LIX shows that the mean yield per plot, per variety, in the two localitiesj 
Pusa and Karnal, are 42*76 and 40*88 pounds with a mean difference of 1*88 lbs. 
The critical difference at the 1 per cent level being only 1*1985 lbs., the observed 
difference of 1*88 lbs. is statistically significant and suggests that the oat varieties 
generally yielded higher at Pusa than at Karnal. 

Table LIX 


Mean yield, 'per plot per variety at Pusa and Karnal 


Localit.ias 

Pusa 

Karnal 

S. E. of diflEerenco 

l^usa 


—1*88 

/ 20*6178 X 2 _ 

V 196 

Karniil . 

-1 188 

• ■ 

Critical dilforonce at 1% 
level = 1*1985 lbs. 

Mean 

42-76 

40*88 



Table LX shows the differences of mean yields per plot per season and shows 
that significantly higher yields were obtained in 1932-33 than in 1931-32 and that 
the yields produce ‘d in 1933-34 were significantly higher than those secured in the 
previous two years. 

'FABLIfl LX 


I)iJ]hrnr,es of irnean yields pier plot. p)e.r season, 
iynoring the loailities 


iSiiiiHonf-i 

1931-32 

1932-:i3 

1933-34 

19Sl-;}2 . 

• . 

-hU*80 

H 13*81 

19*12-3:1 . 

—11*80 

■ • 

•+2*01 

19 : 1 : 1-34 . 

—13*81 

—2*01 


Means 

33*28 

45*08 

1 

47*09 
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S. E. o£ difference = / ^ ^ 

V 130 

Critical difference at 0*01 level = 1>4507, 


All the differences in the above table are therefore significant at the 1 per 
cent level. 

The most important comparison, however, and in fact the main aim of the whole 
test, was the determination of significant differences in the yielding powers of the 
thirteen varieties of oats under observation. Table LXI provides all the possible 
sets of differences between mean yields of the varieties, irrespective of seasons and 
localities. Differences which are greater than the critical differences for P = 0-01 
and which are, therefore, significant at this level are in bold figures, thus + 10 * 76 . 
Those which are significant only at P = 0*05 level are in italic figures, thus + 3' 00. 
Differences which are not significant are showm in ordinary figures. It may be 
pointed out that only positive and not negative significant differences have 
been shown in bold or in italic figures for the sake of convenience. 


The conclusions are obvious and a detailed recapitulation of the significant 
differences in the above table seems unnecessary. 

The five highest yielders are shown as M, L, C, J, and G which are ranked in this 
order but varieties L, C, J , and G are not statistically difterent from each other and 
may all be classed as yielders of the same order. M is statistically superior in yield- 
ing power to G only and not to any other type of these oats. We can safely conclude 
perhaps that in C, J and G we have three hybrids in which the high yielding capa- 
city of the Pusa parents has been combined with plump grain and other good quali- 
ties of the Scotch Potato oats. This is a finding which should prove of immense use 
in arranging seed distribution programmes. 

Hybrids A, B, H and K, on the other hand, have not done well. Although 
hybrids A and B had shown great promise in the preliminary cultural stages and were 
specially attractive for their very plump grains, their behaviour in the serial experi- 
ments indicates that these two types must not be distributed to any*extent. 

Tables LXII and LXIII furnish evidence of the behaviour of these thirteen oats 
under Pusa and Karnal conditions respectively. It is interesting to note how some 
types have changed places as regards rank and how the early maturing type L 
(B. S. 1) has yielded higher at Pusa than M (B. S. 2) which is about a couple of 

weeks later m maturity, whereas at Karnal these two oats have yielded in the 
reverse order. 







Differences of mean yields of different varieties at Pusa, irrespedive of seasons 
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E. of mean difference = y’' t— x 2 = 1-6580 Critical difference at the 1% level = 4-2707 Critical difference at the 5% level = 3-8571 



Differences of mean yields of different varieties at Karnal, irrespective of seasons 
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E. of mean difference = \ x 2 = 1*6580 Critical difference at tlie l®o ” 4*2707 Critical difference at the d% level = S’8o71 
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CHAPTER XI 


SOIL^HETEROGENEITY AND THE ANALYSIS OF COVARIANCE 

In theory, field trials should be conducted under uniform conditions of soil and 
culture. This, however, is an unattainable ideal since an absolutely uniform piece 
of land hardly exists in nature and, therefore, methods must be adopted which 
lessen or eliminate the effects of soil-heterogeneity. Fields which from inspection 
appear satisfactorily uniform have been shown by Harris [1915] and others to be 
very heterogeneous in many cases, and the extent to which even a small field can 
show variations in fertility has been demonstrated by the results of uniformity 
trials conducted at Pusa. A field about one-fourth of an acre in area which had 
received uniform cultural and other treatments was sown in three consecutive 
years with Pusa barley, type 21, Pusa 52 wheat and Pusa lentils, type 11, respec- 
tively. At harvest a substantial border was removed from all sides and the remaining 
field was sub-divided into 390 ultimate plots, each 4 feet square (Table LXIV). 
The produce from these ultimate units was harvested and threshed separately. 
Combinations of 2 X 3 such ultimate plots were made for the purpose of drawing 
contour maps of the yields, the field being considered as consisting of 66 such com- 
bination plots, 5 plots running from West to East by 13 plots running North to 
South as shown in Fig. 20. Assuming that the average yield of each plot is 
located at the centre of a plot, the points at which the yields were 10, 20, 30, 40, 
etc., per cent above or below the mean yield of the crop were marked on the field 
plan by interpolation. These points were joined and the contour maps shown 
in the figure given on next page were constructed. 

It is evident that the yield varied greatly between different sub-plots in the 
same field and that this heterogeneity was systematic to a considerable extent 
is also apparent as the fertility contour lines run to a very pronounced degree more 
or less parallel to tb e direction of the columns running North to South. The contour 
lines show a good deal of similarity in all the three maps shown on next page. 
The differences present are due, of course, to the variability in the yielding 
power of the different crops under consideration and its relation to the mean 
yield of the crop under the conditions of the experiment. 

Generally speaking, soil-heterogeneity may exist either as a gradual change of 
productivity from one side of the field to the other corresponding to the line of 
slope or the pathway of irrigation, etc., or as random patches of ground of higher 
or lower fertility. This ‘ patchiness ’ of a field is due to adjacent plots resembling 



-10 -10 - 20-30 



Fig, 20 . — Contour maps of the yields of (i) barley, (ii) wheat and (ni) lentils obtained in three successive years from, the same experimental field. 
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each otlier and represents the most common type of soil-heterogeneity. These 
irregularities in the field may sometimes be so powerful a,s to vitiate the results of 
varietal or breeding trials, giving significance to yields in situations where it is not 
actually present. 


It must be remembered that the main object of the modern methods of field 
trials, and tlie application of refined methods of analysis to tlieir results is to lessen, 
as far as possible, this efiect of soil-heterogeneity. Although Fisher [1932] does 
not recommend in general the cropping of the experimental plots in the y<'.ar previous 


to the experiment, as it involves double the labour of the experiment and a year’s 
delay before the result is made available, it may be profitable if the experimenter 
has some measure of the nature of his field and knows in which direction the soil 


fertility varies. He has then the advantage of having valuable information to 
assist him in deciding on the lay-out of the experiment and on the correct size and 
shape of plots which he should choose. It is also possible for him. then to discard 
fields of “ patchy ” fertility for comparative trials. In agricultural experimental 
and demonstration stations where the testing of improved varieties of crops or 
their response to different manurial or other treatment forms an important 
item of work, a knowledge of the lieterogeneity of different fields proves very 
advantageous. 


Harris [1915] has suggested tliat soil-heterogeneity may be measured by a co- 
efficient which shows the degree of correlation between the yields of continuous 
plots. Fisher’s analysis of variance may also be employed to determine the drift 
in the fertility of the field. Whereas, Harris’ method provides a measure of hetero- 
geneity present in the whole field, Fisher’s analysis of variance method not only 
gives this measure but also clearly sets forth the direction of the fertility gradient 
and shoidd, therefore, prove a more comprehensive method for such work. 

For the purpo.ses of tins chapter, it will be sidiicient, perhaps, to illustrate tlie 
calculations of the measure of soil-heterogeneity by tliesc two methods in the experi- 
ment with wheat conducted in 1930-31 at Pusa. 


(1) Harris’ method 

The yields of ultimate plots of wheat are shown in Table LXIV. Two series 
of groupings hereafter called the 1x5 combination and the 2x5 coml)inatioii, 
respectively, were made by totalling together the yields of one plot North to South 
and five plots .Fast to West in the first case and two plots North to South and 
five j)lots East to West in the second case. Thus out of 390 ultimate plots, 78 
combination plots could be made up for the 1 X 5 comhination and 39 combination 
plots formed for the 2x5 combination. The yields of these two series of com- 
bination plots are shown in Tables LXV and LXVI respectively. 
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Table LXIV 


Yield of Pusa 52 wheat from ultimate plots 4 ft. X 4 ft. in area 



Total sum of squares of all individual yields 
Sj)» = 26741531. 


Mean of tiltimate plots or p = 


251*838 gms, 
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Table LXV 


Yield of wheat in 1 X 5 combination ‘plots 


Row No. 

Yield in gms. in Block No. 

I 

II 

III 

1 

• 

• 

« 

• 

« 

• 

* 

1033 

1182 

980 

2 

* 


• 

* 

« 


• 

1141 

963* 

845 

3 


m 


- 

m 



1297 

1017 

1180 

4 

m 



* 

A 



1620 

1176 

1055 

6 

« 




• 



1232 

1167 

1076 

0 

« 



* 

* 



1404 

1000 

982 

7 

« 



m 

A 



1470 

954 

1120 

8 

m 



m 


• 


1568 

1376 

1067 

9 

» 





• 


1700 

1299 

952 

10 





A 

• 


1690 

1274 

1260 

11 




m 

A 



1632 

1292 

1200 

12 




m 

• 



1670 

1260 

1200 

13 

• 


- 





1630 

1262 

1230 

14 

A 


f 





1633 

1302 

1348 

15 

m 







1641 

1344 

1296 

10 

-• 







1362 

1096 

1113 

17 

- 







1347 

1240 

817 

18 

F 







1777 

1329 

1065 

19 

. 





« 


1572 

1474 

896 

20 

m 





A 


1613 

1277 

1210 

21 

* 





* 


1629 

1347 

1215 

22 

• 


* 





1333 

1316 

1146 

23 

• 







1432 

1392 

1117 

24 


« 

« 





1321 

1360 

1040 

25 

• 

* 

A 


. 

. 

. 

1205 

1410 

946 

26 

^ F 

« 

• 

• 

* 

• 

• 

1162 

1422 

674 




Total sum of squares 

• 

55669496 

+41160675 

+30846360 


T 


».6., SCjO® = 127676631 
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Table LXVI 


Yield of wheat in 2 X S wmMnation plots 


Row No, 






Yield in gms. in Block No. 







I 

II 

III 

i and 2 



* « 



2174 

2146 

1826 

;i and i 



« m 



2817 

2192 

2235 

5 and 6 

* 


• • 



2636 

2167 

2067 

7 and 8 

« 


• « 



3028 

2329 

2187 

0 and 10 



# » 



3290 

2573 

2212 

11 and 12 



m 0 



3202 

2642 

2400 

13 and 14 



* » 



3263 

2664 

2678 

15 and 16 



• ♦ 



2893 

2439 

2408 

17 and 18 



«p 11 



3124 

2669 

1882 

19 and 20 



« 0 


0 

3185 

2751 

2106 

21 and 22 



0 0 


• 

2962 

2662 

2361 

23 and 24 





1 

2763 

2762 

2167 

25 and 26 






2367 

2832 

1619 



Total Bum of squares 


110684070 

4-81927483 

•461268640 


i.e., 'ZCp^ = 253870193. 

The coeflacient of correlation between the fertility of contiguous nlots is cal- 
culated by the formula — 


[jS (Cp^) — S 1)}] —(^3)2 

— . . . (44) 

where p = average yield of all ultimate units ; 

n = number of units in each group ; 
m = number of groups ; 

S (p^) = sum of squares of the yields assigned for ultimate units ; 

S (Gp^) = sum of squares of the group yields : 

CT _ square of the standard deviation of assigned yield for the ultimate units. 
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T1i 6 s<^uar6s of all yields, i.c., of ultimate plots or of combination plots, are 
written down and summated and the standard deviation of yields for tlie ultimate 
units is obtained by the formula — 


n 


• (45) 


0-2361 


where S (p^) is the sum of squares of the yields of ultimate units and p is the average 
yield of all ultimate units. By substituting the actual values obtained in this 
experiment we get — 

For 1 X 5 combination — 

[{ 127575531 — 26741531} / 78{5 (5-1)}] — (251-838)2 

~ ^ F '1 2 0-2361 

( / 26741531 / 98217 \2 ) 

( V 390 ~ V 390 / 4 

± (1 — >' 2 ) 

= = ± 0-0468 

\/ n 

-*• = 0-2361 ± 0-0468 

For 2 X -5 combination — 

[{253870193 — 26741531} /39 [10(10— 1)}] — (251-838)2 


8.Er. 


• f 

• • • rf 


~1~ 0"0468 


S. E, 


sy- 

1 — r2 

a/ n 


26741531 

390 


/ 98217 \2 I' 
V 390 " ) J 


0-2500 


0-04747. 


— ” 0-2500 dr 0*0475 

It may be pointed out that the large figures in these calculations have bfsen ob- 
tained only because jdelds were retained in units of grammes. A calculating machine, 
the ‘ Comptometer having been used no difficulty, whatsoever, w^as experienced 
in handling these figures. The yields could have been converted into decagrams 
or even kilograms and the size of figures for their respective squares could thus 
have been reduced. 


(2) Fisher’s analysis of variance 

A better method of calculating the variability present in this experimental field 
would be to determine the variance hetwecji and within columns for the whole experi- 
ment. This would not only furnish a criterion for the amount of variability present 
in the field but would also show the nature of its direction. 

For the sake of convenience, let us assume that we have 24 ultimate plots in 
each column instead of the actual 26 small plots. There are thus 24 X 15 or 360 
ultimate plots to consider in this field. The total sum of squares for calculating 

L 
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the analysis of variance in the experiment is obtained by squaring the yield of each 
ultimate plot, summing and subtracting the product of the general total and the 
general mean. The sum of squares “ between columns ” is determined by squaring 
the total yield of each of the combination sub-plots or columns, summing, and 
dividing by the number of ultimate plots contributing to each of these combination 
sub-plots or columns, and subtracting the same product of the general total and 
the general mean as was used in obtaining the true total sum of squares. The 
sum of squares due to variations “ within the columns ” is, therefore, the difference 
between the total sum of squares and the sum of squares due to variation “ between 
columns ”. As there are 360 ultimate plot^ in the whole experiment, the total 
degrees of freedom will be 359. The degrees of freedom “ between columns ” will 
be 15 — 1 or 14 : and those “ within columns ” will be 359 — 14 or 345, in other 
words, the degrees of freedom “ within columns ” are 23x 15. The mean squares or 
variance for each itenj can now be calculated by dividing the respective sum of 
squares by their appropriate degrees of freedom. The significance of the differences 
between and within columns can now be determined by Fisher’s 2 :-test or more 
easily by a modification of this test — Mahalanobis’ cc-test — which is nothing but 
a comparison of the ratio of variance^ to varianccg. 

The following yields were obtained per column, each column consisting of 24 
small plots, one square yard each, in area : — 


Column No. 

Yield in grras. 

1 

6425 

2 

7796 

3 

7952 

4 

7078 

5 

6076 

6 

6767 

7 

5609 

8 

7578 

9 

4911 

10 

6820 

11 

6072 

12 

5214 

13 

6862 

14 

4329 

15 

4930 

General total 

91409 

General mean 

91409/360 = 253-914 


Correction factor = 91409 X 253*914 — 23210024*826 
Total sum of squares = 25008515 — (correction factor) 

= 1798490*174 


Sum of squares between columns (squares of yields of each column summed 

573791905 

together) -f- 24 (the number of ultimate units in each column) = - 


(correction factor) = 23907996*04 — (correction factor) = 697971*214. 


24 
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Table LXVII 


Analysis of variance 


Variance 

Degrees of 
freedom 

Sum of 
squares 

! 

Mean 

squares 

Mahalanobis’ 

X . 

Between columns 

Within columns 

14 

345 

697971-214 

1100518-960 

49855-0864 } 
3189-9100 >■ 

15-629 

Total 

359 

1798490-174 




The expected ratio x [Mahalanobis, 1933] for ^ 1=12 and z= oo is 2*182 for 
P = 0*01. Hence the observed ratio of 15*629 in this experiment is definitely 
significant. 

From this it may be concluded that the variance between the 15 columns in this 
field is much greater than the variance within columns. In other words, there 
is a greater fertility difference between the columns from West to East than within 
them, i.e., from North to South, a conclusion which is again in conformity with those 
obtained by Harris’ method in the previous pages. 

The analysis of covariance 

If the results of uniformity trials such as described above are available for a 
particular field, or if the yielding capacities of sub-plots in a field in a preliminary 
trial be known it is possible sometimes to increase the precision of the experimental 
results obtained in any subsequent experiment, conducted in the same field, by 
the application of the analysis of covariance. 

We have seen that Fisher’s analysis of variance and its application to latin 
squares or randomized l^locks, is one of the best methods of interpreting field results. 
The variance due to soil differences, from rows or from columns or that from blocks, 
being eliminated in tliis method, the residual variance, that is, that due to the 
chance errors of the experiment, furnishes a better criterion for estimating signi- 
ficance than the standard deviation calculated without any such elimination. In 
the words of Fisher [1932] — 

“ the real and aj^parent precision of the comparison is the same as if the experiment 
had been performed on land in which the entire rows, and also the entire columns, 
were of equal fertility ”. 

Another step forward was taken by Sanders [1930] in suggesting the application 
of the analysis of covariance to results of field trials. This method endeavours 
to increase the precision of experimental results on the basis of a knowledge of 
the yielding capacities of the same plots in a previous or preliminary trial. Sanders 
tried to determine whether soil variations were sufficiently constant, from year to 
year, to give useful corrections in the yields of experimental plots, from the yields 
of the same plots under previous uniformity trials. Considering the published 
results of uniformity trials with cereals, carried out on two fields at Aarslev 
(Denmark), during the years 1906 to 1911, he found that while in one field the 
precision of the experiment was increased by 150 per cent by utilising the previous 
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records, and correcting yields by tbe application of tbe method of covariance, 
yet in another field the plots showed no constancy in yield and he concluded 
that previous uniformity trials could not give any assistance in such a case. Eden 
[1931], however, obtained results of increased precision with a perennial crop, tea, 
by correcting experimental yields on the basis of previous croppings. In the case 
of perennial crops the same plants may be used in both the preliminary test and 
in the actual experiment and in such cases the analysis of covariance appears to 
possess advantages which are not present in its application to annual crops. 

Eisher does not recommend, in general, the cropping of the experimental plots 
in the year previous to the experiment as it involves doirble the labour of the experi- 
ment and a year’s delay before the result is made available. He states that “ it 
seems, therefore, to be always more profitable to lay down an adequately replicated 
experiment on untried land than to expend time and labour in exploring irregu- 
larities of its fertility ”, and admits that “ the chief advantage of the analysis of 
covariance lies not in its power of getting the most out of an existing body of data, 
but in the guidance it is capable of giving in the design of an observational pro- 
gramme, and in the choice of which many concomitant observational programmes 
shall in fact be recorded ”. In plant-breeding stations where the growing of bulk 
crops in fields which may be used for future yield trials is a necessary feature, it is 
however possible to secure, at little expense, an idea of soil-heterogeneity and to 
utilise the result for the correction of future yields. 

Vaidyanathan [1933] has recently found that, by using preliminary yields in the 
case of a manorial experiment on tea, an improvement in the precision in the ex- 
periment is definitely brought out and that such preliminary yields can be utilized 
for designing an improved lay-out for future experiments. Similarly in the design 
of experiments on sugarcane in Padegaon Earm (Bombay Presidency) where yield 
figures of a previous crop of sunn-hemp, subjected to uniform treatment, were avail- 
able, he found that these data might be utilised to the best advantage for designing 
subsequent experiments on sugarcane. 

As an example of the analysis of covariance we may take the data of Pusa 52 
wheat yields given in Table LXIY and data of the yields of type 21 barley in the 
preceding season in the same field. 

Example 24 . — The application of the analysis of covariance to the yields of 
annual crops, barley and wheat. 

In this experiment there are 390 ultimate plots arranged in 26 rows and 15 
columns. For the purpose of the analysis of covariance these ultimate units are 
combined to form a 5 x 5 latin square, the 26th row being discarded from the 
experiment and the ultimate units combined in groups 3x5. Thus in Table LXIV 
the yields of 15 ultimate plots in rows 1 to 5 and columns 1 to 3 form the first 
sub-plot of the latin square. The yields of the sub-plots forming the latin square 
with barley are given in Table LXVIII and the yields of ultimate units of Pusa 
52 wheat are given in Table LXIV and those of sub-plots forming the latin square 
in Table LXXIII. The various steps in the calculation of the analysis of variance 
for the preliminary series with barley are shown in Tables LXIX to LXXII. 
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Table LXVIII 

Prdiminary yields {barley type 21) 
(x-table) 


Plot yields in grms. 


Rows 

Columns 

Total 

A 

B 

C 

D 

E 

I . 


* 

1081-6 

888-6 

813-0 

840-6 

726-0 

4348-5 

II . 

• 

m 

1218-6 

1036-0 

973-6 

817-0 

731-0 

4776-0 

Ill . 

• 


1216-0 

1076-6 

1055-6 

825-0 

693-6 

4866-5 

IV . 

» 


1213-5 

1091-5 

1026-0 

921-5 

736-6 

4989-0 

V . 

• 

* 

1147-0 

1068-0 

1067-6 

764-5 

606-5 

4653-5 

Total 

• 

5876-5 

6169-5 

4935-5 

4168-6 

3492-6 

23632-6 


Mean yield = - = 945-3 grms. 


Table LXIX 


Deviations from the mean, 945'3 grms. 


Rows 

CoiillMNS 

Total 

A 

B 

O 

D 

E 

I . 

. 


-f- 136-2 

—66-8 

—132-3 

—104-8 

—220-3 

-378-0 

II . 

• 

• 

-1-273-2 

-+89-7 

+ 28-2 

—128-3 

—214-3 

+48-6 

Ill . 


• 

-+•270-7 

-1-131-2 

-+110-2 

-120-3 

-261-8 

+ 140-0 

IV . 


• 

-f 268-2 

-1-146-2 

-+80-7 

—23-8 

—208-8 

+262-6 

V . 

• 

• 

-f- 201-7 

-1-122-7 

-+122-2 

—180-8 

—338-8 

—73-0 

Total 

- 

-+1160-0 

i 

-+433-0 

-+209-0 

—658-0 

—1234-0 

0 


M 
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Table LXX 

Squares of deviations 


{x^-table) 


Rows 

Columns 

Total 

A 

B 

G 

D 

E 

I . 

18560-44 

r - 

3226-24 

17503-29 

10983-04 

48532-09 

98796-10 

II 

74638-24 

8046-09 

795-24 

16460-89 

45924-49 

146864-96 

III . 

73278-49 

17213-44 

12144-04 

14472-09 

63403-24 

180511-30 

IV . 

71931-24 

21374-44 

6612-49 

666-44 

43697-44 

143982-06 

V . 

40682-89 

15055-29 

14932-84 

32688-64 

114786-44 

218146-10 

Total 

279081-30 

64916-50 

51887-90 

76171-10 

316242-70 

787298-60 


Table LXXI 


Sum of squares of ‘preliminary yields 


Rows 

i Coliimns 


d 


d 

d!2 


-378-0 

142884-00 

+1150-0 

1322500-00 


H-48-5 ' 

2532-25 

+433-0 

187489-00 

Total sum of squares 
787298-60. 

+ 140-0 

19600-00 

+209-0 

43681-00 


+262-5 

68906-26 

-658-0 

311364-00 


—73-0 

5329-00 

-1234-0 

1622756-00 


Total 

239071-60 

• • 

3387790-00 


Divided by 5 

1 

1 

47814-30 

i 

677558-00 
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Table LXXII 


Analysis of variance of jtreliminary yields 


Due to 

Degrees of 
freedom 

Sums of 
squares 

Mean 

squares 

Rows 

« 

• • 

• 

4 

47814-30 

11963-575 

Columns 

. 

« « 

« 

4 

677658-00 

169389-600 

Error 

• 

* • 

♦ 

16 

61926-20 

3870-3876 



Total 

* 

24 

787298-50 

32804-104 


Similarly, the plot yields in the experimental series with Pusa 52 wheat are shown 
in Table LXXIII and the subsequent tables, viz., Tables LXXIV to LXXVII show 
the various stages in the calculation of the analysis of variance for this series. 


Table LXXIII 

Experimental yields in Pusa 52 wheat 
(y-table) 


Plot yields in grms. 


Rows 

Columns 

Total 

A 

B 1 

(J 

I) 

E 

I . . . 

399-3 

334-2 

327-7 

346-0 

279-0 

1686-2 

11 . . . ^ 

487-1 

412-6 

309-1 

334-3 

297-4 

1900-5 

in . 

493-6 

433-2 

392-4 

374-3 

378-6 

2071-9 

IV . 

470-4 

408-6 

371-1 

380-7 

280-8 

1917-6 

V . 

440-0 

364-0 

463-5 

326-3 

336-9 

1920-7 

Total 

2296-3 

1952-6 

1913-8 

1761-6 

1672-6 

9496-9 


9496-9 

Moan yield = . « 379-876 grms. 
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Table LXXIV 


Deviations from arbitrary mean of 380 grms. 


Rows 

Columns 

Total 

A 

B 

0 

B 

E 

I . 

+19*3 

— 46-8 

-52-3 

—34-0 

-101-0 

—213-8 

11 . . 

+ 107-1 

-4" 32“ 6 

—10-9 

-45-7 

—82-6 

+0-6 

Ill . 

+113-5 1 

+53-2 

+ 12-4 

—6-7 

—1-5 

+ 171-9 

IV . 

+96-4 

*f-28“6 

—8-9 

+0-7 

-992 

+ 17-6 

V . . 

+60-0 

—16-0 

+73-5 

—63-7 

—431 

+20-7 

Total 

*4“ 396*3 

llllllllll^^ 

■~l-“ 13“8 

—138-4 

—327-4 

-3-1 


Table LXXV 


Squares of deviations from arbitrary mean (y^-table) 


Rows 

Columns 

Total 

A 

B 

0 

D 

E 

I . . . 

372-49 

2097-64 

2736-29 

1156-00 

10201-00 

16662-42 

II . 

11470-41 

1062-76 

118-81 

2088-49 

1 

6822-76 

21663-23 

Ill . 

12882-25 

2830-24 

153-76 

32-49 

2-26 

16900-99 

IV . . . 

9292-96 

817-96 

79-21 

0-49 

9840-64 

20031-26 

V . 

3600-00 

256-00 

5402-25 

2883-69 

1857-61 

13999-56 

Total 

1 

37618-11 

7064-60 

8489-32 

6161-16 

28724-26 

88057-46 
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Table LXXVI 


Sum of squares of experimental yields 


Rows 

Columns 



d 


d 




—213*8 

46710*44 

+396*3 

167053*69 

Crude sum of squares 

=88057*45 

4-0*5 

0*25 

+52*6 

2766*76 



4-171*9 

29649*61 

+ 13*8 

190*44 

Coi'rection for average 
(3*1)2 _ 

26 

0*384 

4-17*6 

309*76 

—138*4 

19164*66 



4-20*7 

428*49 

—327*4 

107190*76 

True total sum of 






squares = 

88067*0666 

Total 

76998*66 

• • 

280366*21 



Divided by 6 . 

16199*71 


67271*242 



Subtract correction 

0*3844 


0*3844 



Sum of squares 

i 

16199*3266 


67270*8676 




Table LXXVII 


Analysis of variance of experimental yields 


Due to 

Dogroes of 
freedom 

Sum of 
squares 

Mean 

squares 

Bows 

• 


• 

. 

w « • 

4 

16199*3256 

3799*8314 

Columns . 

• 

• 

« 

• 

. 

4 

67270*8676 

14317*7144 

Error 

“ 

- 

* 

• 


16 

16686*8824 

974*18016 






Total 

24 

88057*0666 

3669*0444 


We note that the residual error in the case of the preliminary series (Pusa barley 
type 21) is 3870*3875 and that in the experimental series (Pusa 52 wheat) the residual 
error is 974*1804 only. 

To obtain an estimate for correcting the residual error of the experimental 
series on the basis of the error of the preliminary series we use the method of covari- 
ance. The principle involved is the use of the regression of experimental yields 
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on previous yields for all tlie plots concerned. Using tlie regression equation of tlie 
form 

y—b.x 

where ^=the yield in the experimental series and a;=the previous yield ; a new 
variance of v corrected for x is obtained and this new variance satisfies the equa- 
tion : — 


y- 


yy - Fj 

{Cot\ xyY 


.(46) 


where Cm. is the covariance betweenl^^^^^^^ In other words covaeiance 

may be defined as the mean product of deviations of the two variates just as vari- 
ance is the mean product of the deviations of a single variate. If the produce 
of an individual plot in one year is any guide to its performance in another, the 
variance Vy ,^, will naturally give the variance of y corrected by the regression 

equation corrected mean plot yields 

y “ given by any two treat 

where h — ments will be the differencs 

V between two expressions o 


If n be the number of renlicates the difference between the 


the form 


S (y—hx) 


fable LXXTOI slmm S xy. This is done by 

obtaining the products of deviations in Tables LXIX and LXXIV and summat- 
ing them. 

Table LXXVIII 


Products of deviations taken from tables LXIX and LXXIV {XY -table) 


Rows 

Columns 

. . 

Total 

XY 

A 

B 

G 

D 

E 

I . 

+ 2628-66 

+ 2601-44 

+ 6919-29 

+ 3563-20 

+ 22260-30 

+ 37962-89 

11 . 

+ 21)259-72 

+ 2024-22 

—307-38 

+ 5863-31 

+ 17701-18 

+ 55441-06 

Ill . 

+ 30724-45 

+ 6979-84 

+ 1366-48 

+ 685-71 

+ 377-70 

+40134-18 

IV . 

+ 26854-48 

+4181-32 

—718-23 

-16-66 

+ 20712-96 

+ 50013-87 

V . 

+ 12102-00 

—1963-20 

+ 8981-70 

+ 9708-96 

+ 14602-28 

+ 43431-74 

Total . 

+ 100669-31 

+ 14723-62 

+ 16241-86 

+ 19804-52 

+ 75644-42 

+ 226983-73 


The XY products for rows and columns respectively are determined by multi- 
plying the totals for rows or for columns in the preliminary (X) series by the cor- 
responding totals in the experimental (Y) series and summating them. 

The sums of squares for row'S and columns of the X and Y series have already 
been determined in Tables LXXI and LXXVI but for convenience may again be 
included in Table LXXIX in conjunction with the XY values. 
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Table LXXX summarises these results and in addition has the X^, XY and 
Y^ values for residual error, these values being obtained simply by deducting the 
sums of squares for X and Y and the sums of XF products for rows and columns 
from the totals. 


Table LXXX 


Sums of squares and products 


Due to 

Degrees 

of 

freedom 


XY 

ya 

Rows 

4 

47814*30 

21603*11 

15199*3266 

Columns . 

4 

67756800 

192528*76 

67270-8576 

Error 

16 

61926-20 

1 

12851-86 

16586-8868 

Total . 

24 

787298-50 

j 

i 

226983*73 

88057-0700 


We have now to calculate the coefficient of regression by dividing the sum of 
products i for error of th e i xy value by the sum of squares for error of the preliminary 
yieldsT Thus 

6 = ^ == 0*2075. 

61926*20 

This correction may be applied either to individual plots or to the composite 
totals represented by rows, columns, treatments or errors. 

To obtain sums of squares for adjusted yields in any line {i.e., either in rows, 
columns or in errors) we multiply the entries under x^, xy and in that parti- 
cular line in Table LXXX respectively by the values of 6^, -“25 and 1 as shown 
below and add the products and tabulate them as shown in Table LXXXI. 

We see that the coefficient of regression, h — 0-207^h therefore 6^ ~ 0*0431 
and 26 = 0*415 A The adjusted sums of squares, therefore, are : — 

For rows (47814*30x0*0431)+ 21603*11 X (—0*4150) + 15199*3256 = 

8294*83128. 

For columns (677558*00x0*0431) + 192528*76 X (— 0*4160) + 67270*8576 = 
6574*1720. 

For error (61926*20x0*0431) + 12851*86 X (—0*4160) + 16686*8868 = 
12922 * 3841 . 
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The following table shows the analysis of adjusted yields. 

Table LXXXI 
Analysis of adjusted yields 

Due to Degrees of Sum of Mean 

freedom squares squares 

Rows .... 4 8294-8313 2073-7078 

Columns .... 4 ()574-1720 1643-6430 

Error .... 15 12922-3841 861-4923 

Regression ... 1 2664-6027 2664-6027 


Comparing this analysis of adjusted yields with the analysis of variance for 
cxperiinental yields shown in liable LXXVII, the most striking change observed 
is the reduction of the mean square for error from 974:-18()15 to 861-4923, in spite 
of the reduction in the degrees of freedom a,ppT'opriate to it, showing thereby tliat 
tfic precision of the comparison lias liccm inc.reasiid l)y taking the preliminary yields 
into consideration. 

Thus — 

Mean square for error in Table LXXVII _ 974-180 _ ^ 

Mean square for error in Table LXXXI 861-492 

In this case it may be pointed out that the ratio between the two mean squares 
considered is only 1-13. 

In perennial crops such as tea, Eden [1931] has secured an increased precision 
of 6-81 times in comparison witli tlie cxperinK-ntjd erroi- ('xisting without the use 
of th(‘ analysis of covariance. Kecently, Vaidyanathan [ 193i^ lias similarly olitain- 
ed an improvement of about 16 times in the precision of tea, results of the Tocklai 
T(;a Expi'rimental Station by using preliminary da,ta. 

It may be pointed out that the application of the method of covariaiKJe for 
correcting experimental yields on tlie liasis of the previous cropping gives more 
sa,tisfactory results in the case of perennial crops, e.y., tea, than in tlie case of 
a,nnuals, c./y., wheat, barley, etc. Vaidya,natlian’s results with the tea figures 
obtained from the Tocklai Station may be taken as an example of the application 
of the method of covariance in improving the precision of residts in perenuial crops. 

Example 25. The application of analysis of covariance to tea results. This 
experiment was carried out in 6 blocks with 4 plots in each block. 

Table LXXXII shows the prehminary yields of tea obtained at Tocklai, and 
during this preliminary period hypothetical treatments (A, B, G, D) have been 
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assumed to exist aud to be distributed in the field ou the same lay-out as during 
the final experiment. 

Table LXXXII 

Preliminary yields of tea {non-manurial treatment) {X-table) 


Plot yields 


Blocks 

A ' 

B 

j 

0 

D 

Total 

1 

* # 

# 

134 

118 

99 

104 

456 

II . 

• « 

» 

133 

105 

138 

142 

618 

Ill . 

* • 

• 

103 

129 

129 

143 

504 

IV . 

• • 

• 

104 

126 


79 

413 

V . 

« « 

* 

143 

127 

142 

127 

539 

VI . 

« « 

• 

109 

113 

111 

127 

460 


Total 

• 

716 

718 

723 

722 

2889 


The analysis of variance of the preliminary yields is shown in Table LXXXIII 


Table 


Analysis of variance of j[)reliminary yields 


Due to 

Degrees of 
freedom. 

Sum of 
squares 

Mean 

square 


Blocks 

• 

5 

2750-375 

550-76 


Hypothetical treatments 

• 

[ ^ 

6-458 

1-82 


Error .... 

m 

! 15 

3959-792 

263-99 


Total 

9 

23 

6715-625 

291-98 
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The experimental jields are shown, in Table LXXXIV shown under ; — 

Table LXXXIV 

E xperimental yields of tea {with 4 manurial treatments in the experitnent) ( Y -table) 


Plot yields 


Blocks 

Treatments 



Total 

A 

B 

C 

D 

I 

• • 

• 

135 

110 

98 

117 

460 

II 

• • 

• 

136 

107 

145 

176 

563 

III 

• m 

. 

102 

126 

138 

181 

646 

IV 

m m 

• 

108 

136 

114 

96 

464 

V 

« • 

• 

149 

116 

164 

144 

573 

VI 

« • 

« 

110 

110 

120 

152 

492 


Total 

• 

740 

704 

779 

865 

! 

3088 


The analysis of variance of this series is shown in Table LXXXV given below : — 


Table LXXXV 


Analysis of variance of experimental yields 


i 

Duo to 

Degrees of 
froodom 

Sum of 
.squares 

Mean 

square 

Blocks 

m » 

* 

5 

3476-83 

696-17 

Troatmont« 

m • 


3 

2391-00 

797-00 

Error 

ti * 


16 

i 

7222-50 

481-60 


Total 

• 

23 

13089-33 

669-10 


The various steps of calculation have been shown in great detail in the previous 
example and have, therefore, been omitted in the present case. The analysis of 
adjusted yields is shown in Table LXXXVL 

N 3 
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Table LXXXVI 


Analysis oj variance of adjusted, yield {y~hx}i 


Due to 

Degrees 

of 

freedom 



— 2bxy 

y% _|_ — 2bxy 

Mean 

square 

Block 

5 

3476-83 

2750-37 

X 1-714 

-2978-75 
X 2-6184 

390-414 

78-08 

Treatments 

1 

! 

3 

2391-00 

6*46 
X 1-714 

—25-17 

X2-6184 

2334-460 

778-15 

Error . 

14 

(after 
allowing 1 
degree for 
linear 
regression) . 

7222-60 

3959-79 
X 1-714 

-5184-18 
X 2-6184 

435-588 

j 

31-11 

Total 

22 


• • 

• • 

3160-4.52 



Improvement in precision by using the preliminary data 

Mean square for ' error ’ in Table LXXXV 481*5 

Mean square for ‘ error ’ in Table LXXXVI 31*11 


= about 15*5 times. 


The above table of adjusted yield obtained after allowing for linear regression 
shows that by combining the ‘ preliminary ’ and ‘ experimental ’ analysis the stand- 
ard error of the experiment is considerably reduced, and that the improvement in 
precision is nearly 16 times what it would otherwise be by analysing the experi- 
mental data alone. Where preliminary yields of experimental plots can be secured, 
it is possible, therefore, to obtain greater precision of results by applying the method 
of covariance. 
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LOGARITHMS 


The ordinary jiroeesses of arithmetical computatioua are greatly simplified by the use of 
logarithms. I jOgaritlnns are a practical application of ihc wuVl-kaown algebraic rule 

X .t;* = x~ + •* = x^ 
and a:® -4- a* = .r'* ~ ^ = a;^. 

It is possible to express any number as a power of another number, e.i/;, 

(H - S'^ = 4“. 

In this case, 64 is the square or second power of 8 and the cube or third power of 4. If w6 
desired to express 64 as a power of 10 we should get an index (power) of less than 2, since 10 
is larger tlian S. Actually it is found by calculation that 64 — lO^-***"'- and 1‘8062 is called 
the logaritlim of 64 to the base 10. 


Logarithms of numbers are powers to the base 10, thus— 


100 — 10“, therefore log. 100 — 2-0. 

1,000 = lO'L therefore log. 1000 — 3-0. 

10,000 =: 10'*, therefore log, 10000 = 4'0. 

I’ables of logarithms have been constructed by inathematicians for all numbers up to 10,000 
and are of enormous use in shortening tlic processes of calculation. The logarithm of a number 
consists ol: a.n intcgi'al part (iallctl the characterislic and a deeinial part, the mantissa^ 


For example, from a table of logarithms we find that the logarithm of 5184-0 is 3-7147. 
In tins ease 3 is the charadei istic and 0-7147 is the mantissa. In the case of numbers greater 
than unity the csharacteristic is always one less than the number of figures to the left of the 
decimal |)oint. Thus — 


The logarithm of a number 
less than unity is negative 
but the mantissa is kept 
positive by making the 
characteristic negative and 
1 more than the number of 


log. 6184-0 =3-7147. 

log. 518-4 = 2-7147. 

log. 51-84 = 1-7147. 

log. 5-184 =0-7147. 




zeros which follow the deci- log. 0-6184 = - — 1 -j- 0-7147 or 1-7147 

— 2 + 0-7147 or 2-7m 

II vve luivc to mnltiply 64 X 81 l)y tho use of logaritlims in practice we proceed as follows :• 


log. 64 = 1-8062. 
log. 81 = 1-0086 


log. (64 X 81) ... 3-7147 

From a table of autilogarithms we can find tho niimbev corresponding to a given logarithm. 
In this way we identify 7147 as the logarithm of 6184, and since the characteristic in this case 
is 3 the numb(U' rtiquired must be betweem 10-* and 10*, i.e... between 1,000 and 10,000. There- 
fore 64 X 81 ■= 6184-0. 
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In tlie case of division 

81 ^ , 81 

— we get log. — = log 81 — log 6 4 

or (1*9085 — 1*8062) = 0*1023 ! 

but 0 1023^Togrr2BB ^ 

= 1*266. 

64 

Example 1. Multiply 47*5620 by 0*003710 

log. 47*5620 = 1*67726 
and log, 0*003710 = 3*56937 


log. (47*5620 X 0*003710) =1*24663 


i.e.. 

Example 2. 


Example 5. 


Example 4. 


Example 5. 


= log. 0*17645 

47*5620 X 0*003710 = 0*17645. 
Divide 0-0516 by 33*6210 

*1 •H' /*k .i-k 


log. 0*0516 

= 2*71265 

log. 33*6210 

= 1*52661 

0*0516 

33*6210 

= 1*18604 


= log. 0*0015347 

0*0516 

33*6210 

= 0*0015347 

Divide 0*0176 

by 0*008437 

log. 0*0176 

= 2*24550 

log. 0*008437 

= B-92619 

, 0*0176 

0*008437 

= 0*31931 

= log. 2*08595 

0*0176 

0*008437 

= 2*08595 

Divide 0*008437 by 0*0176 

log. 0*008437 

= 2*92619 

log. 0*0176 

= §24550 

, 0*008437 

0*0176 

= 1*68069 


= log. 0*4794 

0*008437 

0*0176 

= 0*4794 

Divide 0*0176 

by 0*0516 

log. 0*0176 

= 2*24550 

log. 0*0516 

= 2*71265 

, 0*0176 

0*0516 

= T-53285 


= log. 0*34108 

0*0176 

0*0516 

= 0*34108 
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Tlio calculations of square roots and tke powers of numbers are greatly simplified by the 
use of logarithms. Students must remember that 

log. a^ —71 log. a 

i.e., log. (64)^ = 2. log. 64 

== 2 X 1-8062 
= 3-6124 
= log, 4096 ; 

and again that 

ai X ai — a 
\/~a == 

log- \/~a — i- log- “ 

or log, 'v/~^ = ^ X 1-8062 

= 0-9031 
=5 log. 8. 

Similarly 

ai X ai X ai = a 
== 

log. ® = I log. a 

and (a4)3 = at = (/«/«)'* 

.*. log. (a^)® = 3 X log. a. 

The SLIDE RULE is a mechanical device of which the two principal components slide over 
one another and arc cacli graduated in a Icgarithmic scale. This allows of the determination 
of simple products, quotients, and roots even more rapidly than with a table of logarithms. 
Students should familiarise themselves with the use of the slide rule. A 20-inch rule is accurate 
to four figures and is sufficient for most of the statistical computations met with in biological 
problems. 

NAPERIAN LOGARITHMS. — The logarithms described above are calculated to the 
base 10 and are those whicli are universally employed for ordinary computations. In higher 
mathematics and in certain cases in statistics Naperian logarithms which are calculated to the 
base 2-71828 are used. It is not necessary here to give details of the theory of Naperian loga- 
rithms, it is sufficient to state that the number, 2-71828 is an incommensurable number 

generally indicated by the symbol e, and is the summation of the series 

111 1 

1 4._ _l_ _ -l_ __ + + 

which, when worked out to the first six decimal places, gives the result 2-718282.. . 

Tables of logarithms to the base e are not always available but common logarithms to the 
base 10 can be converted into Naperian logarithms by a simjjle equation : — 

log. « = log.e a X 0-4343 

log.«a = = log- 10 « X 2-3026 

When a — any number. 

The conversion figure 0-4343 is called the modulus of the common system of logarithms. 

Example 6 shows the method of obtaining the logarithms to the base e, by the use of a 
table of Naperian logarithms, of a series of numbers in which the same digits occur with different 
decimal values. 
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Exafnple 6 . — 

(а) Find log.e 5476-12. 

log.^, (5-47612 X 1000) = log.n 5-47612 + log., ,10'^ 
log.f- for the number 5*47612 is given by tbe summation be-low : — 

1-69928 

110 

18 

037 

1-7004017 

From the table, log.elO^ = 6-907760 

Therefore log.e 5476-12 = 1-70040 + 6-90776 = 8-60816)^ 

(б) Find log.e 547-612 

log.e for the number 5-47612 = 1-70040 as shown above. 
log.aO^ = 4-60517 

Therefore log.e 547-612 = 1-70040 -f 4-60517= 6-30557 

(c) Find log.e 54-7612 
log.e 10^ = 2-30258 

Therefore log.,, 54-7612 = 4-00298 

(d) Find log. e 5-47612 

The required logarithm is tbr Irr ii illiiii nf lln nmnlirr 1-70040. 

(e) Find log.e 0-547612 

log.e 10-1 3-697410 

log, 6-47612 = 1-70040 

I^elb54f612 = 1-39781“ 

(/) Find log.e 0-0547612 
log.e 10-2 ^ 5-394830 
log.e of number = 1-70040 

log.e 0-0547612 = 3-09523. 


INTERPOLATION 

A function y = / (a:) can be evaluated accurately, provided / (.e) is an algebraic e.vpres- 
sion involving squaring, addition, subtraction, multiplication and division. But if y = 
log- X, the value of y cannot be so easily calculated. In such cases where the value of y cannot 
be got by the performance of a definite number of finite simple arithmetical operations, we are 
forced to have recourse to a table which gives the values of y corresponding to certain convenient 
values of x. Here we select a number of values x^, etc. for a- and tabulate the cor- 
responding values of ?/. Let us assume that the logarithms of numbers 1, 2, 100 hav(! been 

tabulated. Ihe question arises as to how we can find the values of log. x intermediate between 
any two of the numbers of the table. The answer to this question is given by the theory of 
interpolation which in its elementary aspect is the science of r-eading between the linf^s of a 
mathematical table 

Again it sometimes happens in statistical records that gaps occur which it is desirable to 
fill in. This may be due to lack of sufficient observations or to the destruction of old records. 
If we have a frequency distribution in which a few data are missing, we can by the method of 
interpolation supply the missing data to a fair degree of approximation. 
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A short example will suffice to illustrate the principle involved. Suppose we want to find 
the logarithm of 255*475. From logarithmic tables, we get that log, 255 = 2*40654 and log^j^ 
256 = 2*40824, If the logarithms and the numbers are taken as the two variables and if we 
plot these on a grapli paper, we will get a logarithmic curve. But for all practical pur]}oses the 
portion of tlie curve between 255 and 256 can be considered to be a straight line and the value 
of the logarithm witli respect to the base 10 for 255*475 can be written down by aj)plying the 
simple method of proportions. 


log, 255*475 


(2*40824 — 2-40654) 

=2*40654 + *475 X 2*40654 -f *0008075 - 2*4073475. 


256 *-255 

Now if we have got a table which gives the logarithms of 255*4 and 255*5 we can, by using 
tlic same method, get the value of log. 255*475. 

log. JO 255*4 = 2*40722 
log. JO 255*5 = 2*40739 

As before 

(2*40739—2*40722) 


log. 255*475 = 2-40722 H d75 X- 


255*5-255*4 


2*40722 h *0001275=: 2*4073475,. 


It may be noted that the values of tlie log. 255*475 got by the two metliods are identical in 
this case. But in some cases there may be slight dift*orenc*.es, because the portion of the curve 
between the two values of x need not be a straight line. It may bo a cui'vo in which case some 
corrections will have to be a.pplied. But for all practical purposes for using the logarithmic and 
other tables, we can assume that the (iorrections are so small tliat tiiey can be neglected safely 
and the method indicated above, on the assumption that a linear relation exists between two 
variables for a particular small range, (‘an l)e a(loj)ted for interpolation. 
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APPENDIX II 

IMPORTANT FORMULJ5 


samples 


Aver. dev. = 


G. V. = 


n 

CT X 100 


P* E, Alean 
jP . E. cs = 


M 

■v^ n 
0-6745a 


\/ 2n 

P. E. c. V. = 0-6745 Fx ^1+2 

I 

±0-6745 (1— r^) 




2n 


P.E.r = 


^ n 

P. E. diff. == ^ ^^2 

S, E, ao.—2) = JS.i)2 + {8. E.^Y • 


P . E. average = ■+ 6^ + . 


-r^ -r-, n / PX ^7 

P. E. of any probability = 0-6745 / — - — 


P. E. oi a class frequency = 0-6745 q, n. 
S (/. Sr. Sy) 




A. aa:. ay 

S(Ar) 


^xy — 




(A. A.) 


y s(A2) _ - 

y-v 



Pormula 

number 

Page 

• 

(1) 

9 

• 

(2) 

14 

• 

(24) 

54 

# 

(3) 

15 

» 

(4) 

15 

• 

(7) 

27 

• 

(8) 

27 

• 

(9) 

28 

• 

(33) 

67 

• 

(10) 

29 

• 

(19) 

48 


(11) 

31 

e 

(13) 

32 


(14) 

32 

A 

(32) 

65 

* 

(36) 

38 
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Formula 

number 

Page 

2 S(0— C)2 

^ c • . . 

Student’s Z = 

Md. 

(15) 

33 

S - . 

1 • 

;• 


/ 

V n 

(One sample). 

(21) 

61 

Fisher’s z = 1/2 log,e ~ . 


f * * 

103 

» 

• 

Fisber’s t == 

/ 1 , 1 

(23) 

63 

Mahalanobis’ * - 

Varianceg 

sj n'Jn^ , 

(two samples), 
8 being given by formula 

• » 

104 

Mahaknohk’/- _ %-»»2 


(24) 

(30) 


/W + ih) 

2 


o2 

<J 2 





/=< /i • . . . 

V % 

• * • 

• * • 

(31) 

63 

Critical difference d ~ t x S. E. . 

• • « 

* • * 

(27) 

68 


Insert Formula (25) t = 





Degrees of freedom for smaller mean square 
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APPENDIX ra 


Snebecor’s Tables for the Values of F (Ratio op VariancE|^ to Varianoeo) 

AND RiSHER’s t FOR DIFFERENT DEGREES OF FREEDOM 
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APPENDIX m—contd. 



Sm!(U'.cur’.s /<’ irt Uio hjiuii' an Miihalaiiolii.s' in jnul In tin: ratio <if Variama;, to Variaiicc, 

Srnalccor, 0. W. (lOa-t). - Calcvilatlon and Interprolatiou of Analysis of Variance and Uovariaucc. Iowa, 
PI). a»-oi. 
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t 

z 
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64 

65 
73 
58 
10 
25 
51 
14 
14 
58 
14 

4 

87 


of 


82 
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157 

53 
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24 

176 

10 

4 

( 
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27 
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same 


27 
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. 

. 

35 
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- 

27 

128 
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of 

cor- 


81 

relation 

« 
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